Chapter 8: Markov Chain Monte Carlo Flashcards
Shortly write down the formulas of conditional and joint PMF’s. Afterwards give Bayes’ rule.
Conditional PMF = p(X1 | X2) = p(X1, X2) / p(X2)
Joint PMF = p(X1, X2) = p(X2|X1)* p(X1)
Bayes’ Rule: P(X1|X2) = p(X1, X2) / p(X2)
= ( P(X2 | X1) * p(X1) ) / p(X2)
Bayes Rule also holds for Probability density functions.
Explain each component of Bayes’ Rule
Bayes’ Rule:
p(X1|X2) = p(X2 | X1) * p(X1) ) / p(X2)
Here:
- p(X1) is the prior probability
- p(X1|X2) is the posterior probability
- p(X2 | X1) is the likelihood function
- p(X2) is the partition function (unconditional probability)
Explain transition Distributions in Markov chains
The transition distribution of a Markov chain is a probability distribution that describes the likelihood of moving from one state to another state in the chain. It is represented by the map q(z | x), where x and z are states in the chain, and q(z | x) is the probability of transitioning from state x to state z. In other words, it is the probability of the next state being z, given that the current state is x. The transition distribution is time-homogeneous, meaning that the probability of transitioning from one state to another does not change over time.
What is meant with a stationary distribution of the MC?
A stationary distribution is a probability distribution for the states of a Markov chain that does not change over time. In other words, if a Markov chain is in a stationary distribution, the probability of being in any particular state does not change as time goes on.
It is important to note that the presence of homogeneity alone does not guarantee that the Markov chain has a stationary distribution. Homogeneity ensures that the Markov Chain will reach a steady state if it is irreducible and aperiodic, but if the Markov Chain is not irreducible or aperiodic, it will not reach a steady state. An irreducible Markov chain is one in which it is possible to go from any state to any other state (there is a path), while an aperiodic Markov chain is not cyclical, meaning that there is no state that the chain will revisit after a certain number of steps.
What is meant with the detailed balance condition for a stationary distribution?
The detailed balance condition, also known as the local balance equation, is a sufficient condition for a Markov chain to have a stationary distribution. It states that for any two states i and j, the probability of transitioning from i to j is equal to the probability of transitioning from j to i.
In mathematical terms, if p(i) is the probability of being in state i, then the following equation must be true for all states i and j:
p_i * q(i, j) = p_j * q(j, i)
where q(i, j) is the transition probability from state i to state j.
The detailed balance equation ensures that the probability flow between any two states is balanced, meaning that if the probability of transitioning from state i to state j is greater than the probability of transitioning from state j to state i, the probability of being in state i will decrease, and the probability of being in state j will increase until the balance is restored.
Explain when a markov chain has a unique stationary distribution.
When all the entries of the transition probability matrix are positive, the Markov chain is said to be irreducible, which means that it is possible to go from any state to any other state.
Furthermore, if the Markov Chain is aperiodic, that is, if it is not cyclical, meaning that there is no state that the chain will revisit after a certain number of steps (guarantee), it will reach a steady state, and this steady state is a stationary distribution.
In this case, since all the entries of the transition probability matrix are positive, the Markov chain is irreducible and aperiodic, and therefore it will have a unique stationary distribution.
Explain Markov Chain Monte Carlo (MCMC) and describe when you would use MCMC.
When you want to sample from a distribution, it is not always possible to sample directly because you might not know the distribution explicitly. It could also be that your that is too complex to be solved analytically, or when you only have incomplete information about the model.
Instead of sampling directly from the distribution, MCMC generates a sequence of samples that converge to the target distribution. By running the chain for long enough and discarding the initial samples, you can obtain a set of samples that are representative of the target distribution
This is commonly used when you know the shape of the distribution (like gaussian) but you do not know the values of the parameters that determine the scale of the distribution. MCMC can be used to estimate the parameters of the distribution from the data.
Explain the need for a Markov chain to have a limit distribution in order to effectively perform MCMC sampling.
For a Markov chain to be useful for MCMC sampling, it must be ergodic, which means that it must have a unique stationary distribution and it must be possible to reach this stationary distribution from any starting state. If a Markov chain does not have a limit distribution, then it will not converge to a steady state and the samples generated by the chain will not be representative of the target distribution
For large k, we have p(Xk ) ≈ p(X∞) = p(X), so xk is essentially a sample from p(X)!
Explain what the Metropolis-Hastings algorithm is and which two requirements are need for it to be used.
The Metropolis-Hastings algorithm is a widely used method for generating samples from a target distribution using a Markov chain. The algorithm is based on the Metropolis-Hastings (MH) acceptance-rejection rule, which is used to decide whether to accept or reject a proposed move in the chain.
The Metropolis-Hastings algorithm requires two components:
- A function h(x) that is proportional to the target distribution p(X = x) = p(x). This means that there is a constant c that is not equal to zero such that h(x) = c * p(x) for all x in the state space X. This function is often much easier to obtain than the target distribution itself.
- A proposal distribution g(z|x) that is used to propose a move from the current state x to a new state z. The proposal distribution should be easy to sample from for any x in the state space X, and it should satisfy g(z|x) > 0 for all x, z in the state space X.
Explain the steps in the Metropolis-Hastings Algorithm.
- Set initial state x1 and start iterating from k = 1
- Sample candidate z from propsal distribution g(Z|xk)
- Decide to accept with probability α, or reject with probability 1 - α, where α = min(1, h(z)g(xk | z) / h(xk)g(z | xk))
- on accept, set xk+1 = z; on reject set xk+1 = xk.
Explain the steps in the Metropolis-Hastings Algorithm.
- Set initial state x1 and start iterating from k = 1
- Sample candidate z from propsal distribution g(Z|xk)
- Decide to accept with probability α, or reject with probability 1 - α, where α = min(1, h(z)g(xk | z) / h(xk)g(z | xk))
- on accept, set xk+1 = z; on reject set xk+1 = xk.
- Go to step 2.
After sufficient “burn-in” iterations, this will generate samples from p(X). Make sure to throw away the samples from the burn-in period.
Explain how the Metropolis-Hastings algorithm constructs a markov chain of the state space X.
Metropolis-Hastings constructs a Markov Chain on X , for which
q(z | x) = g(z | x)A(z | x)
where A(z | x) is the acceptance probability of z from state x, given by:
A(z | x) = min(1, h(z)g(xk | z) / h(xk)g(z | xk))
= min(1, p(z)g(xk | z) / p(xk)g(z | xk)) > 0
where we used that
- h(x) = c · p(x) for some c 6= 0,
- p(x) > 0 for all x ∈ X
What is guaranteed, and what is not guaranteed in relation to the Metropolis-Hastings algorithm?
- Guaranteed to converge to p(X) under mild conditions.
- Speed of convergence not guaranteed