L3 - Naïve Bayes Flashcards
What are the Naïve bayes learning outcomes?
Describe the basic principles of Bayes Theorem
Apply specialised methods and data structures needed to analyse text data in R
What is Bayes theorem?
A statistical principle for combining prior knowledge of the classes with new evidence gathered from data
Define the formula for Bayes theorem?
P(A│B)=(P(A”∩” B))/(P(B)) = (P(B│A)∗P(A))/(P(B))
How to test conditional probability for independence?
P(A│B) = P (A)
Formula for P(A|B) and P(B|A)? Rewrite this as the intersection set?
P(A∩B)= P(A|B) * P(B)
P(B∩A)= P(B|A) * P(A)
Give the proof for (1)
(1) P(A│B)=(P(A"∩" B))/(P(B)) Similarly: (2) P(B│A)=(P(B"∩" A))/(P(A)) So we have: (3)P(A∩B)= P(A|B) * P(B) (4)P(B∩A)= P(B|A) * P(A) and we know: (5) P(A∩B) = P(B∩A) So by re-writing (1) we have: (6) =
Explain the components of bayes theorem?
Posterior probability - P(A|B)
Likelihood - P(B|A)
A priori probability - P(A)
Marginal likelihood - P(B)
Why is it called Naïve Bayes?
IT makes strong assumption that it has independence among features
The features do not affect one another
Explain the steps involved for: When we observe a msg contains ‘Viagra’ and ‘Unsubscribe’ but not either ‘Money’ or ‘Groceries’, what is the probability that this msg is spam?
Assumption - naïve bayes assumes independence amongst features (e.g. P(A|B) = P(A))
Just focus on numerator to start with
(1) For independent events we have : P(A ∩B) = P(A)P(B),
(2) Rewrite P(B|A) in numerator = P(W1|Spam)P(W2not|Spam)…*P(Spam)
Search inside the spam class for each feature (e.g. spam or not)
(3) Calculate ham using the same formula but change the class
(4) Calculate Spam/Ham
(5) Denominator (normalisation/scaling)- Probability of features being spam /Probability of said features being ham and spam
What is normalisation/scaling for naïve classifiers?
The probability of features being the variable of interest/the probability of features being the variable of interest
P(A|features) / P(B|features)
What is the naïve bayes classification algorithm? What are the components?
PCL(|F1…Fn) = 1/Z CL = class label F1..Fn = n features 1/z = scaling factor
How can be we find the probability of P(B|features) when we have calculated P(A|features)?
Use the same equation for P(A|features) But replace with the information in the likelihood table for the alterative class label
How does the classification work for naïve bayes algorithm?
Training - calculate likelihood tables
Testing - given new unseen data
(1) - finds its probability of it belonging to each class using the likelihood tables
(2) Picks the probable class (e.g. which class is more likely after normalisation)
How does Naïve Bayes classify?
Picks the most probable class given the features observed (e.g. which is more likely after normalisation)
(1) Calculate the posterior probability for each class.
(2) The class with the highest posterior probability is the outcome of prediction.
* ** you calculate both classes and pick the one with probability that is higher
Difference between classifier and normalisation?
Normalisation - takes the probability of the classes and converts them into percentages