L8: Markov Chain Monte Carlo Flashcards
What is Bayes Rule?
p(X | Y) = p(X and Y)/p(Y) = p(Y|X)*p(X)/p(Y)
p(X|Y)= posterior
p(Y|X)= likelihood
p(X) = prior
p(Y) = evidence
How to calculate posterior expectation E[f(X) |Y]?
Easy f and analytical posterior: direct calculation
Hard f and analytical posterior: MC estimation
Any f and complex posterior: MCMC
What is a transition density?
p(X_k+1 | X_k = x)
What is the transition distribution of an MC?
A map q: XxX to [0, inf] for all x,z in X as q(z|x) =p(X2=z|X1=x) = p(X_k+1 = z| X_k = x)
i.e. that the prob of the first value being x and the second value being z is independent of time
What is a stationary distribution?
If pi is a distribution from X to [0, inf] for all x,z in X it holds that pi(x)q(z|x) = pi(z)q(x|z)
If this exists then pi(x) = p(X_inf = x)
What is a limiting distribution?
A limiting distribution is such a distribution π
that no matter what the initial distribution is, the distribution over states converges to π
as the number of steps goes to infinity.
Must be a stationary distribution
What is the Metropolis-Hastings algorithm? What does it require?
A popular instance of Markov Chain Monte Carlo.
Requires:
- a function h(x) proportional to the posterior
- a proposal distribution g(Z|x)
Steps in Metropolis-Hastings algorithm?
- Set initial state x1 and start iterating from k = 1:
- Sample candidate z from proposal distribution g(Z | xk )
- Decide to accept with probability α, or reject with probability 1 − α,
where
α = min {1,h(z)g(xk | z)/h(xk )g(z | xk )} - On accept, set x_k+1 = z; on reject set x_k+1 = x_k
- Go to step 2.
After sufficient “burn-in” iterations, this will generate samples from p(X).
Will converge to our target distribution p(x)- think about histogram (guaranteed to do so but speed not guaranteed)
Make sure to throw away the samples from the burn-in period.
What is the basic idea of Markov Chain Monte Carlo?
Can construct a Markov Chain with the relevant probabilities and sample from it at each time step
It has a limit distribution so at large times we are taking a sample from p(X)
Construct it so that detailed balance holds