Bayesian method Flashcards
What are Bayesian methods?
- provide computational techniques of learning
- are useful for the interpretation of non-probabilistic algorithms
- combination of a priori knowledge on hypotheses with observed data to have probability predictions
- difficult in practice
- computationally expensive, many examples needed for the correct estimation
What does the Bayes Theorem say?
- P(h|D) = P(D|h)P(h)/P(D)
- P(h), a priori probability of the hypothesis h
- P(D), a priori probability of training data
- P(h|D), probability of h given D
- P(D|h), probability of D given h
How is the hypothesis chosen in Bayesian methods?
- we want to select the most probable hypothesis given the learning data
- maximize posteriori hypothesis (hMAP)
- if there are uniform probabilities on the hypotheses (same) then we can choose the so-called maximum likelihood
How does brute force Bayesian learning works?
- compute the posterior probability for each hypothesis
* return the hypothesis with the highest a posterior probability
What assumptions do we make on the hypotheses?
- a uniform probability on the hypotheses
* deterministic noise-free training data
How do we learn a real-valued function?
- real-valued target function f
- learning examples xi, di,
- di has some noise, di = f (xi) + ei
- ei random variable (noise), indipendent for each xi with Gaussian
- we use maximum likelihhod to find the hypothesis that minimizes the distance between the hypothesis and the data samples
How do we learn a probabilistic function?
- we have to maximize the cross-entropy
How do we determine the most likely classification for new instances?
- HMAP(x) classification not necessarily the most likely
- bayes optimal classification
- best class for instance is the one that maximizes the probability P(vj|hi)P(hi|D) considering each available hypothesis
What is a Gibbs classifier?
- bayes optimal classifier computationally expensive with many hypothesis
- Gibbs exploits randomicity to reduce cost
- choose a hypothesis at random, with prob P(h|D)
- use it to classify new instance
- assuming that the target concepts are randomly extracted from H according to an a priori probability on H, Gibbs classifier is a 2-approx for bayes optimal classifier
What is a Naive Bayes classifier? When to use it?
- simplest and most popular techniques to use Bayesian methods
- to be used when
- there are large datasets
- attributes are conditionally independent given the classification
- succesful and popular in
- diagnosis
- classification of textual documents (spam, topics)
What is the algorithm for naive bayes?
- for each target value
- estimate the probability of the target value in the dataset
- for each possible value of each attribute
- estimate P(av|tv)
- return the two estimations for each pair
Is the assumption of conditional independence necessary?
- no, it is often violated
- works anyways
- not necessary to correctly estimate the posterior probability
- posterior probability calculated by Naive Bayes is often close to 1
or 0
What if no learnin example with target value tv has the av attribute value?
- Bayesian m-estimate for P(av|tv)
How to classify text with Naive Bayes?
- each document is represented by a vector of words
- an attribute for each word
- rivedi slides/registrazione
When should the Expectation Maximization (EM) algorithm be used?
- when data is only partially observable
- with unsupervised clustering
- supervised learning (some attributes with missing values)
- for example it can be used
- for learning Bayesian networks
- for learning of Hidden Markov Models