week 11 Flashcards
Why are methods like Markov Chain Monte Carlo (MCMC) needed in Bayesian analysis, especially compared to direct sampling methods?
Direct sampling algorithms often suffer from the curse of dimensionality and become inefficient or difficult to implement in high dimensions, where many real statistical problems lie. MCMC provides a way to sample from complex, high-dimensional posterior distributions.
What is the fundamental property of a Markov chain X₀, X₁, ..., Xn, ...
?
The Markov property: The distribution of the next state Xn
depends only on the current state Xn-₁
, not on the entire history X₀, ..., Xn-₁
. \nP(Xn
∈ A | X₀
, …, Xn-₁
) = P(Xn
∈ A | Xn-₁
)
What is a transition kernel Q(A|x)
in a time-homogeneous Markov chain?
It defines the probability that the next state Xn
will be in set A
, given the current state Xn-₁
is x
. \nQ(A|x) := P(Xn ∈ A | Xn-₁ = x)
If the transition kernel Q
has a density q
, how is Q(A|x)
expressed?
Q(A|x) = ∫<sub>A</sub> q(y|x) dy
How is the marginal distribution of the state at step n, νn(A) = P(Xn ∈ A)
, calculated from the distribution at step n-1, νn-₁
?
By integrating the transition kernel against the previous state’s distribution: \nνn(A) = ∫<sub><0xE2><0x84><0x9B><0xE2><0x84><0x9B></sub> Q(A|x) νn-₁(dx)
What does it mean for a distribution Π (with pdf π) to be invariant (or stationary) for a Markov chain with transition kernel Q
?
If the chain’s current state Xn
is distributed according to Π, then the next state Xn+₁
will also be distributed according to Π. \nMathematically: Π(A) = ∫<sub><0xE2><0x84><0x9B><0xE2><0x84><0x9B></sub> Q(A|x) Π(dx)
(or π(x) = ∫<sub><0xE2><0x84><0x9B><0xE2><0x84><0x9B></sub> q(x|y)π(y)dy
using densities).
What is the detailed balance condition for a transition density q
with respect to a distribution π?
q(y|x)π(x) = q(x|y)π(y)
for all x, y
.
Does detailed balance imply stationarity?
Yes, detailed balance is a sufficient condition for π to be a stationary (invariant) distribution for the Markov chain defined by q
.
What does it mean for a Markov chain to be Φ-irreducible?
For any set A
with positive measure under Φ (Φ(A) > 0
), there is a positive probability of reaching set A
from any starting state x
in a finite number of steps n. \nQⁿ(A|x) > 0
for some n > 0
, for all x ∈ X
.
What does it mean for a Markov chain to be recurrent (specifically, Φ-recurrent)?
For any set A
with Φ(A) > 0
, if the chain starts in a state x
typical under Φ (in a set of Φ-measure 1), it will visit A
infinitely often (i.o.) almost surely. \nP(Xn ∈ A i.o. | X₀ = x) = 1
, for x
in a set B
with Φ(B)=1
.
What is Harris recurrence, and how does it differ from standard recurrence?
Harris recurrence requires that P(Xn ∈ A i.o. | X₀ = x) = 1
for every possible starting state x ∈ X
(not just almost every state under Φ), for any set A
with Φ(A) > 0
. It is a stronger condition.
If a Markov chain is irreducible and has an invariant distribution Π, what properties follow (Theorem 1)?
- (a) The chain is Π-irreducible and recurrent.\n- (b) Π is the unique invariant distribution.
What is the Ergodic Theorem for MCMC estimators (Theorem 2)?
If {Xn}
is an irreducible Markov chain with invariant distribution Π, and E<0xE2><0x82><0x99>[|h|] < ∞
, then the Monte Carlo average h̄n = (1/n) Σ<sub>i=1</sub><sup>n</sup> h(Xᵢ)
converges to the expectation E<0xE2><0x82><0x99>[h]
almost surely with respect to Π, for Π-almost all starting points X₀ = x
. \nP(lim<sub>n→∞</sub> h̄n = E<0xE2><0x82><0x99>[h] | X₀ = x) = 1
, for Π-a.s. x
.
What does Total Variation (TV) convergence (||Qⁿ(·|x) - Π(·)||_TV → 0
) mean for a Markov chain (Theorem 3)?
It means that the distribution of the chain’s state Xn
starting from x
, given by Qⁿ(·|x)
, becomes indistinguishable from the stationary distribution Π as n → ∞. This requires the chain to be irreducible and aperiodic.
What is a Harris-ergodic Markov chain?
A chain that is irreducible, Harris-recurrent, and aperiodic.
What is the implication of TV convergence everywhere (Theorem 4) for a Harris-ergodic chain?
The marginal distribution νn(·)
converges to the stationary distribution Π in Total Variation, regardless of the initial distribution ν₀
. \nlim<sub>n→∞</sub> ||νn(·) - Π(·)||_TV = 0
.
What does it mean for a Markov chain to be geometrically ergodic?
It’s a Harris-ergodic chain where the TV convergence to the stationary distribution happens at an exponential rate: \n||Qⁿ(· | x) – Π(·)||<sub>TV</sub> ≤ M(x)rⁿ
for some function M(x)
with finite expectation under Π, and some 0 < r < 1
.
What is the Central Limit Theorem (CLT) for MCMC estimators (Theorem 5)?
For a Harris-ergodic Markov chain under certain conditions (e.g., geometric ergodicity and moment conditions on h
), the standardized sample mean converges in distribution to a standard Normal: \n√n (h̄n - E<0xE2><0x82><0x99>[h]) / sh → N(0,1)
.
What is the asymptotic variance s²h
in the MCMC CLT? How does it differ from the i.i.d. case?
s²h = Var<0xE2><0x82><0x99>[h(X₀)] + 2 Σ<sub>k=1</sub><sup>∞</sup> Cov<0xE2><0x82><0x99>(h(X₀), h(Xk))
. It includes the variance term (Var<0xE2><0x82><0x99>
) plus terms accounting for the autocorrelation between samples in the chain. For i.i.d. samples, the covariance terms are zero.
What is the goal when designing an MCMC algorithm for a target distribution Π (with density π)?
To construct a transition kernel Q
such that:\n- Π is the invariant distribution of the resulting Markov chain.\n- The chain is ergodic (irreducible and aperiodic, preferably Harris-ergodic) to ensure convergence of estimators and the distribution itself to Π.
In the Metropolis-Hastings (MH) algorithm, what is the role of the proposal density q(y|x)
?
It’s an auxiliary density used to propose a candidate state X'
based on the current state Xn-₁
. It is not the final transition kernel of the MH Markov chain.
What is the acceptance probability α(x, y)
in the Metropolis-Hastings algorithm?
α(x, y) = min{1, [π(y)q(x|y)] / [π(x)q(y|x)]}
(assuming the denominator is non-zero).
How is the acceptance probability α(x, y)
calculated if the target density π is known only up to a normalizing constant Z
(i.e., using the kernel π*
)?
Since π(y)/π(x) = π*(y)/π*(x)
, the unknown constant Z
cancels out: \nα(x, y) = min{1, [π*(y)q(x|y)] / [π*(x)q(y|x)]}
. The normalizing constant is not needed.
Describe the main steps within the loop (for n = 1 to N) of the Metropolis-Hastings algorithm.
-
Propose: Sample a candidate
X' ~ q(x' | Xn-₁)
. \n2. Calculate acceptance ratio: Computeα(Xn-₁, X')
. \n3. Accept/Reject: GenerateU ~ U[0,1]
. IfU ≤ α(Xn-₁, X')
, setXn = X'
. Otherwise (ifU > α
), setXn = Xn-₁
(rejectX'
and keep the previous state).