Week 5 Flashcards
MAP (abbreviation)
Maximum A Posteriori method
What does the MAP method output?
It finds only a single value of a parameter that maximizes the posterior, so it forgets the uncertainty in the posterior.
What is the Laplace approximation?
When you don’t have a formula for the posterior, the Laplace approximation finds a different distribution that tries to approximate the posterior.
In what ways should the approximation agree with the posterior in Laplace approximation?
The maximum likelihood, w^ needs to be the same, so this fixes the choice of the mean for the approximation. Plus the area around w^ needs to behave the same in both functions.
What property does the multivariate Gaussian of a Laplace approximation need to have in order to satisfy agreement of the area around w^?
It needs to have the same Hessian of w^ as the actual posterior.
What two things of the posterior do you need in order to construct the laplace approximation for a particular posterior?
The point of maximization and the 2nd derivatives.
T/F:
The gamma distribution is a continuous distribution
True
Over what is the density of the gamma distribution defined?
It is continuous, so on positive real numbers.
Over what is a Gaussian distribution defined?
Over positive and negative reals.
Over what is the beta distribution defined?
between 0 and 1.
In Bayesian statistics, what is the distribution called that you need to have when you want to compute the probability that a specific new datapoint has a certain value?
The Bayesian predictive distribution.
What happens in Monte Carlo sampling?
Draw many random independent samples of w, compute P(t.new|w,…) for each w and take the average of these.
How can you write the expected value of a discrete random variable?
as a summation of all possible values with their probabilities.
What is the output of the Metropolis-Hastings algorithm?
a sequence of random samples, w1, …, wNs, with Ns= the number that we determine beforehand.
What requirement of samples do you drop when using the MH algorithm?
The requirement that samples are independent.
What does the MH algorithm do?
It explores the space of w’s and if it finds a region with large density of w’s it picks more before it continues to another region.
Describe the MH algorithm in pseudo-code:
choose w1
for s=2,3,…,Ns
{ propose new sample ~ws
compute acceptance ratio r
if probability is min(1,r)
{ accept proposal, so ws= ~ws}
else
{ reject proposal, so ws= ws-1}
}
How do you pick w1 in the MH algorithm?
By sampling from the prior.
is MH an optimization algorithm?
no
What is the acceptance ratio a product of?
The posterior ratio and the proposal ratio.
How is the posterior ratio calculated in the MH algorithm?
The posterior of the proposed point is divided by the posterior of the previous point.
what is the goal of the proposal ratio?
it compensates for tendencies of the proposal distribution. If the proposal density has a tendency to go to the upper right, the ratio should not always accept upper right points.
Why is the proposal ratio for a multivariate gaussian always equal to 1?
Stepping in either direction is equally likely
What is ||x||?
The Euclidian norm of x
What is a Euclidian norm?
The length of the vector when measured from (0,0).
For MH with a symmetric proposal distribution, the acceptance ratio depends on…
the posterior density at the proposed point and the posterior density at the previous point.
Why can the MAP estimate be considered a step up from the maximum likelihood solution?
it incorporates the prior.
Burn-in
The first x samples in the MH algorithm, called burn-in because it may take a while to find regions where the posterior is large.