Bayesian method Flashcards

Question 1

Q

What are Bayesian methods?

Answer

A

provide computational techniques of learning
are useful for the interpretation of non-probabilistic algorithms
combination of a priori knowledge on hypotheses with observed data to have probability predictions
difficult in practice
- computationally expensive, many examples needed for the correct estimation

Question 2

Q

What does the Bayes Theorem say?

Answer

A

P(h|D) = P(D|h)P(h)/P(D)
- P(h), a priori probability of the hypothesis h
- P(D), a priori probability of training data
- P(h|D), probability of h given D
- P(D|h), probability of D given h

Question 3

Q

How is the hypothesis chosen in Bayesian methods?

Answer

A

we want to select the most probable hypothesis given the learning data
- maximize posteriori hypothesis (hMAP)
if there are uniform probabilities on the hypotheses (same) then we can choose the so-called maximum likelihood

Question 4

Q

How does brute force Bayesian learning works?

Answer

A

compute the posterior probability for each hypothesis

* return the hypothesis with the highest a posterior probability

Question 5

Q

What assumptions do we make on the hypotheses?

Answer

A

a uniform probability on the hypotheses

* deterministic noise-free training data

Question 6

Q

How do we learn a real-valued function?

Answer

A

real-valued target function f
- learning examples xi, di,
- di has some noise, di = f (xi) + ei
- ei random variable (noise), indipendent for each xi with Gaussian
we use maximum likelihhod to find the hypothesis that minimizes the distance between the hypothesis and the data samples

Question 7

Q

How do we learn a probabilistic function?

Answer

A

we have to maximize the cross-entropy

Question 8

Q

How do we determine the most likely classification for new instances?

Answer

A

HMAP(x) classification not necessarily the most likely
bayes optimal classification
- best class for instance is the one that maximizes the probability P(vj|hi)P(hi|D) considering each available hypothesis

Question 9

Q

What is a Gibbs classifier?

Answer

A

bayes optimal classifier computationally expensive with many hypothesis
- Gibbs exploits randomicity to reduce cost
choose a hypothesis at random, with prob P(h|D)
- use it to classify new instance
assuming that the target concepts are randomly extracted from H according to an a priori probability on H, Gibbs classifier is a 2-approx for bayes optimal classifier

Question 10

Q

What is a Naive Bayes classifier? When to use it?

Answer

A

simplest and most popular techniques to use Bayesian methods
to be used when
- there are large datasets
- attributes are conditionally independent given the classification
succesful and popular in
- diagnosis
- classification of textual documents (spam, topics)

Question 11

Q

What is the algorithm for naive bayes?

Answer

A

for each target value
- estimate the probability of the target value in the dataset
- for each possible value of each attribute
  - estimate P(av|tv)
- return the two estimations for each pair

Question 12

Q

Is the assumption of conditional independence necessary?

Answer

A

no, it is often violated
- works anyways
- not necessary to correctly estimate the posterior probability
- posterior probability calculated by Naive Bayes is often close to 1
  or 0

Question 13

Q

What if no learnin example with target value tv has the av attribute value?

Answer

A

Bayesian m-estimate for P(av|tv)

Question 14

Q

How to classify text with Naive Bayes?

Answer

A

each document is represented by a vector of words
- an attribute for each word
rivedi slides/registrazione

Question 15

Q

When should the Expectation Maximization (EM) algorithm be used?

Answer

A

when data is only partially observable
with unsupervised clustering
supervised learning (some attributes with missing values)
for example it can be used
- for learning Bayesian networks
- for learning of Hidden Markov Models

Question 16

Q

How can EM be used to estimate k-means?

Answer

Study These Flashcards

A

input
- X instances generated by a mixture of k Gaussians (not known which generated which)
- unknown Gaussian means
objective
- determine maximum-likelihood extimates of Gaussian means
each instance can be seen in the form yi = (xi, gi1,…, gin) where gx = 1 if Gaussian x generated sample xi
algorithm chooses initial hypothesis then repeats:
- E step -> calculate the expected value for each unobservable variable z
- M step -> compute the new maximum-likelihood hypothesis assuming that the values for unobservable variables are those expected

Bayesian method Flashcards

(16 cards)