L3 - Naïve Bayes Flashcards

Question 1

Q

What are the Naïve bayes learning outcomes?

Answer

A

Describe the basic principles of Bayes Theorem

Apply specialised methods and data structures needed to analyse text data in R

Question 2

Q

What is Bayes theorem?

Answer

A

A statistical principle for combining prior knowledge of the classes with new evidence gathered from data

Question 3

Q

Define the formula for Bayes theorem?

Answer

A

P(A│B)=(P(A”∩” B))/(P(B)) = (P(B│A)∗P(A))/(P(B))

Question 4

Q

How to test conditional probability for independence?

Answer

A

P(A│B) = P (A)

Question 5

Q

Formula for P(A|B) and P(B|A)? Rewrite this as the intersection set?

Answer

A

P(A∩B)= P(A|B) * P(B)

P(B∩A)= P(B|A) * P(A)

Question 6

Q

Give the proof for (1)

Answer

A

(1) P(A│B)=(P(A"∩" B))/(P(B))
 Similarly: 
 (2) P(B│A)=(P(B"∩" A))/(P(A))
 So we have:
 (3)P(A∩B)= P(A|B) * P(B) 
 (4)P(B∩A)= P(B|A) * P(A) 
 and we know: 
 (5) P(A∩B) = P(B∩A)
 So by re-writing (1) we have:
 (6)  =

Question 7

Q

Explain the components of bayes theorem?

Answer

A

Posterior probability - P(A|B)
Likelihood - P(B|A)
A priori probability - P(A)
Marginal likelihood - P(B)

Question 8

Q

Why is it called Naïve Bayes?

Answer

A

IT makes strong assumption that it has independence among features
The features do not affect one another

Question 9

Q

Explain the steps involved for: When we observe a msg contains ‘Viagra’ and ‘Unsubscribe’ but not either ‘Money’ or ‘Groceries’, what is the probability that this msg is spam?

Answer

A

Assumption - naïve bayes assumes independence amongst features (e.g. P(A|B) = P(A))
Just focus on numerator to start with
(1) For independent events we have : P(A ∩B) = P(A)P(B),
(2) Rewrite P(B|A) in numerator = P(W1|Spam)P(W2not|Spam)…*P(Spam)
Search inside the spam class for each feature (e.g. spam or not)
(3) Calculate ham using the same formula but change the class
(4) Calculate Spam/Ham
(5) Denominator (normalisation/scaling)- Probability of features being spam /Probability of said features being ham and spam

Question 10

Q

What is normalisation/scaling for naïve classifiers?

Answer

A

The probability of features being the variable of interest/the probability of features being the variable of interest
P(A|features) / P(B|features)

Question 11

Q

What is the naïve bayes classification algorithm? What are the components?

Answer

A

PCL(|F1…Fn) = 1/Z
 CL = class label 
 F1..Fn = n features 
 1/z = scaling factor

Question 12

Q

How can be we find the probability of P(B|features) when we have calculated P(A|features)?

Answer

A

Use the same equation for P(A|features) 
 But replace with the information in the likelihood table for the alterative class label

Question 13

Q

How does the classification work for naïve bayes algorithm?

Answer

A

Training - calculate likelihood tables
Testing - given new unseen data
(1) - finds its probability of it belonging to each class using the likelihood tables
(2) Picks the probable class (e.g. which class is more likely after normalisation)

Question 14

Q

How does Naïve Bayes classify?

Answer

A

Picks the most probable class given the features observed (e.g. which is more likely after normalisation)

(1) Calculate the posterior probability for each class.
(2) The class with the highest posterior probability is the outcome of prediction.
* ** you calculate both classes and pick the one with probability that is higher

Question 15

Q

Difference between classifier and normalisation?

Answer

A

Normalisation - takes the probability of the classes and converts them into percentages

Question 16

Q

Why is KNN lazy?

Answer

Study These Flashcards

A

It just compares similar features when classifying

Naïve bayes learns from the inputted likelihood table

Question 17

Q

Name one problem with naïve bayes?

Answer

Study These Flashcards

A

If one of the features is a 0 (e.g. 0/20) then this will deem the whole equation as 0
0 features overrides all other features

Question 18

Q

How to overcome the naïve bayes problem?

Answer

Study These Flashcards

A

Use Laplace estimator/smoothing

Non-zero probability - adds a small number to each of the features

Question 19

Q

Explain how to overcome the problem in more detail?

Answer

Study These Flashcards

A

Numerator - add 1
Denominator - balance by changing the denominator by the same amount (doesn’t have to be 1)
Do not change prior knowledge (e.g. a priori probabilities A or B)

Question 20

Q

How to use naïve bayes with continuous features?

Answer

Study These Flashcards

A

Discretisation (binning)

Categorises the numeric data into different bins

Question 21

Q

How to set bins?

Answer

Study These Flashcards

A

Prior knowledge (e.g. spam more likely in the day time so set parameters for day and night)
 Simply use quantiles

Question 22

Q

Name some strengths for Naïve bayes? Explain.

Answer

Study These Flashcards

A

(1) Robust to irrelevant features (some algorithms affected by unusual features)
For example you might include eye colour as a feature to classify gender, eye colour is actually completely irrelevant to gender but when you use probability as a classifier this bears out in the evidence for example male and females will be have nearly identical blue eyes
(2) Robust to missing data - some classifiers discard whole observation with missing data naïve will just drop the one missing feature (e.g. will reduce the sample size by 1 if only missing 1 missing attribute)

Question 23

Q

Name and explain some weaknesses of naïve bayes?

Answer

Study These Flashcards

A

(1) strong assumption that all features are independent - faulty assumptions which does not exist in the real world
Not important to obtain precise probabilities so long as predictions are accurate
(2) strong prediction but weak probability estimates - some other algorithms are better at providing probability estimates because of the independence assumption

Question 24

Q

What are Bayesian methods? What is Naïve?

Answer

Study These Flashcards

A

Bayes classifiers are classification methods based on Bayes’ theorem
Naïve Bayes classifier is the simplest one among them, which assumes features are independent

L3 - Naïve Bayes Flashcards

(24 cards)