Part 2: Naive bayes Flashcards
Idea behind Naive Bayes
For patients that suffer from a certain disease we know the probability of symptoms given the disease: P(symptom|disease). We also know prior probabilities: P(disease), P(symptom). With this information we can compute: P(disease|symptom) using Naive Bayes.
Likelihood
Similar to probability but is not a true probability. If we know the likelihood (class 1 =) x1 and (class 2 =) x2. Then we know P1/P2 = x1/x2 and P1 + P2 = 1. Which can be solved simply. Multiplying likelihood by the same number for all classes does not change the priorities (like the denominator explanation).
Assumptions Naive Bayes
- Attributes are equally important.
- Statistical independences (given the class value) , i.e., knowing the value of one attribute says nothing about the value of another (in the same class).
Independence assumption is strong (in many cases invalid), but it works well in many practical cases because if this assumption does not hold, the outcome will still be correct in many cases.
Probability estimated = 0
What if an attribute value does not occur with every class value? (E.g. outlook forecast for class no)
- The probability will be zero
- A posterior probability will also be zero. No matter how likely the other values are.
Remedy is to add 1 to the count for every attribute value-class combination (Laplace estimator). The result is that probabilities will never be zero.
Numeric attributes
- If an attribute is numeric e.g. Salary or Temperature then in case of a new record it is not likely that you find it in the training set.
- Therefor we estimate the distribution of the attribute in each class, and then compute likelihood of classes as before.
- For simplicity we assume mostly normal distribution.
- Why does this work even if the distribution is not normal? As long as the maximum for the class does not change, we can guess the right class. Sometimes, you will have a small error.
Overview Naive Bayes
- Probability estimates (non-deterministic classification).
- Used for discrete and numeric data.
- Simple and relatively accurate (even if independence assumption is clearly violated).
+ Why? Because classification doesn’t require accurate probability estimates as long as maximum probability is assigned to correct class. - Note also: many numeric attributes are not normally distributed but still works.
Naive Bayes is not so naive
- Naive Bayes: Several times First and Second place in KDD-CUP competition, among 16 (then) state of the are algorithms.
+ robust to irrelevant features.
+ irrelevant features cancel each other without affecting results.
+ instead Decision Trees & Nearest-Neighbor methods can heavily suffer from this. - Very good in Domains with many equally important features.
+ Decision Trees suffer from fragmentation in such cases - especially if little data. - A good dependable baseline for text classification.
- Optimal if the Independence Assumptions hold:
+ If assumed independence is correct, then it is the Bayes Optimal Classifier for problem. - Very fast
- Low storage requirement.