Bayesian method Flashcards

1
Q

What are Bayesian methods?

A
  • provide computational techniques of learning
  • are useful for the interpretation of non-probabilistic algorithms
  • combination of a priori knowledge on hypotheses with observed data to have probability predictions
  • difficult in practice
    • computationally expensive, many examples needed for the correct estimation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the Bayes Theorem say?

A
  • P(h|D) = P(D|h)P(h)/P(D)
    • P(h), a priori probability of the hypothesis h
    • P(D), a priori probability of training data
    • P(h|D), probability of h given D
    • P(D|h), probability of D given h
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is the hypothesis chosen in Bayesian methods?

A
  • we want to select the most probable hypothesis given the learning data
    • maximize posteriori hypothesis (hMAP)
  • if there are uniform probabilities on the hypotheses (same) then we can choose the so-called maximum likelihood
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does brute force Bayesian learning works?

A
  • compute the posterior probability for each hypothesis

* return the hypothesis with the highest a posterior probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What assumptions do we make on the hypotheses?

A
  • a uniform probability on the hypotheses

* deterministic noise-free training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we learn a real-valued function?

A
  • real-valued target function f
    • learning examples xi, di,
    • di has some noise, di = f (xi) + ei
    • ei random variable (noise), indipendent for each xi with Gaussian
  • we use maximum likelihhod to find the hypothesis that minimizes the distance between the hypothesis and the data samples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we learn a probabilistic function?

A
  • we have to maximize the cross-entropy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we determine the most likely classification for new instances?

A
  • HMAP(x) classification not necessarily the most likely
  • bayes optimal classification
    • best class for instance is the one that maximizes the probability P(vj|hi)P(hi|D) considering each available hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Gibbs classifier?

A
  • bayes optimal classifier computationally expensive with many hypothesis
    • Gibbs exploits randomicity to reduce cost
  • choose a hypothesis at random, with prob P(h|D)
    • use it to classify new instance
  • assuming that the target concepts are randomly extracted from H according to an a priori probability on H, Gibbs classifier is a 2-approx for bayes optimal classifier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Naive Bayes classifier? When to use it?

A
  • simplest and most popular techniques to use Bayesian methods
  • to be used when
    • there are large datasets
    • attributes are conditionally independent given the classification
  • succesful and popular in
    • diagnosis
    • classification of textual documents (spam, topics)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the algorithm for naive bayes?

A
  • for each target value
    • estimate the probability of the target value in the dataset
    • for each possible value of each attribute
      • estimate P(av|tv)
    • return the two estimations for each pair
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Is the assumption of conditional independence necessary?

A
  • no, it is often violated
    • works anyways
    • not necessary to correctly estimate the posterior probability
    • posterior probability calculated by Naive Bayes is often close to 1
      or 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What if no learnin example with target value tv has the av attribute value?

A
  • Bayesian m-estimate for P(av|tv)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to classify text with Naive Bayes?

A
  • each document is represented by a vector of words
    • an attribute for each word
  • rivedi slides/registrazione
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When should the Expectation Maximization (EM) algorithm be used?

A
  • when data is only partially observable
  • with unsupervised clustering
  • supervised learning (some attributes with missing values)
  • for example it can be used
    • for learning Bayesian networks
    • for learning of Hidden Markov Models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can EM be used to estimate k-means?

A
  • input
    • X instances generated by a mixture of k Gaussians (not known which generated which)
    • unknown Gaussian means
  • objective
    • determine maximum-likelihood extimates of Gaussian means
  • each instance can be seen in the form yi = (xi, gi1,…, gin) where gx = 1 if Gaussian x generated sample xi
  • algorithm chooses initial hypothesis then repeats:
    • E step -> calculate the expected value for each unobservable variable z
    • M step -> compute the new maximum-likelihood hypothesis assuming that the values for unobservable variables are those expected