Chapter 9 Flashcards
Joint Probability and Independence
What is the formula?
- Formula for joint probability using conditional probability:
- p(AB) = p(A)*p(B|A)
- Takes care of dependencies between events.
- The order of A and B is arbitrary
- For completely independent events
- Event B does not tell us anything about event A
- Therefore p(B|A) is simply the probability of p(B)
- Independent events: occurrence of one event does not tell you anything about likelihood of the other event to occur
What is equation for Bayes rule?
Possible to compute the probability of an hypothesis H given some evidence E by looking at the probability of the evidence given the hypothesis P(E|H), as well as the unconditional probabilities of the hypothesis P(H) and the evidence P(E)
How can the Bayes rule be applied to classification?
- p(C =c) is the “prior” probability (i.e. the base rate of c) which is the probability we assign to the class before seeing any evidence, determined by:
- A: Intuition
- B: Belief on some previous estimation
- C: Base rate of class distribution in the data
- p(E | C =c) is the percentage of examples of class c that have feature vector E
- p(E) is the likelihood of the feature representation E among all examples
What are the difficulties with the application of the Bayes Rule?
- Values for the classification can be estimated using training data
- We don’t know the probability of E given class c (conditional probabilities of all features in the feature vector E).
- Very specific and difficult to measure
- Dealt with by making assumptions on the independence of the features
What is the acutal Naive Bayes equation?
- Basis for the naïve Bayes classifier
- Classifies a new example by estimating the probability that the instance belongs to each class
- Reports the class with the highest probability
- P(E) in the denominator is difficult to estimate from the data, but:
- If we don’t need probability estimates P(E) does not need to be estimated, because it is the same for all cases. You simply look for the class that produces the largest numerator
- If you need probability estimates, P(E) can be re-expressed using the independence assumption
What is the version of Naive Bayes equation with which we can compute the posterior probabilities easily from the data?
- If you combine the independence assumption with the actual naïve Bayes’ equation we get to the following equation, with which we can easily compute the posterior probabilities from the data:
- P(c): count the proportion of examples of class c among all examples
- P(e|c): count proportion of examples in class c for which feature e appears
Advantages and Disadvanrtages of Naive Bayes
- Advantages of Naïve Bayes
- Efficient & surprisingly good performance for classification of real-world problems
- Peforms well for categorical variables
- Efficient in terms of computation and needed storage space
- Violation of the independence assumption mostly doesn’t influence classification performance
- Incremental learner: each instance changes the algorithm
- When new data becomes available it must not be processed from the beginning
- Disadvantages of naïve Bayes`
- Naïve Bayes should be used with caution when using the probability estimates for actual decision-making with costs and benefits
- When the independence assumption is violated (meaning that the evidence is strongly dependent on each other), the probabilities of the “correct” class are overestimated and the probabilities of the “wrong” class are underestimated (in other words the probabilities are incorrect when the independence assumption is violated)
How can lift be used with Naive Bayes?
Whats the difference between the Optimal Bayes Classifer and the Naive Bayes Classifier?
What are the equations?
The optimal bayes classifier does not make any indepedence assumptions, while the naive bayes classifer does.
optimal - complex model with high variance and limited data.