week 11 Flashcards by tyrion lannister

Why are methods like Markov Chain Monte Carlo (MCMC) needed in Bayesian analysis, especially compared to direct sampling methods?

Direct sampling algorithms often suffer from the curse of dimensionality and become inefficient or difficult to implement in high dimensions, where many real statistical problems lie. MCMC provides a way to sample from complex, high-dimensional posterior distributions.

How well did you know this?

Not at all

Perfectly

What is the fundamental property of a Markov chain X₀, X₁, ..., Xn, ...?

The Markov property: The distribution of the next state Xn depends only on the current state Xn-₁, not on the entire history X₀, ..., Xn-₁. \nP(Xn ∈ A | X₀, …, Xn-₁) = P(Xn ∈ A | Xn-₁)

How well did you know this?

Not at all

Perfectly

What is a transition kernel Q(A|x) in a time-homogeneous Markov chain?

It defines the probability that the next state Xn will be in set A, given the current state Xn-₁ is x. \nQ(A|x) := P(Xn ∈ A | Xn-₁ = x)

How well did you know this?

Not at all

Perfectly

If the transition kernel Q has a density q, how is Q(A|x) expressed?

Q(A|x) = ∫A q(y|x) dy

How well did you know this?

Not at all

Perfectly

How is the marginal distribution of the state at step n, νn(A) = P(Xn ∈ A), calculated from the distribution at step n-1, νn-₁?

By integrating the transition kernel against the previous state’s distribution: \nνn(A) = ∫<0xE2><0x84><0x9B><0xE2><0x84><0x9B> Q(A|x) νn-₁(dx)

How well did you know this?

Not at all

Perfectly

What does it mean for a distribution Π (with pdf π) to be invariant (or stationary) for a Markov chain with transition kernel Q?

If the chain’s current state Xn is distributed according to Π, then the next state Xn+₁ will also be distributed according to Π. \nMathematically: Π(A) = ∫<0xE2><0x84><0x9B><0xE2><0x84><0x9B> Q(A|x) Π(dx) (or π(x) = ∫<0xE2><0x84><0x9B><0xE2><0x84><0x9B> q(x|y)π(y)dy using densities).

How well did you know this?

Not at all

Perfectly

What is the detailed balance condition for a transition density q with respect to a distribution π?

q(y|x)π(x) = q(x|y)π(y) for all x, y.

How well did you know this?

Not at all

Perfectly

Does detailed balance imply stationarity?

Yes, detailed balance is a sufficient condition for π to be a stationary (invariant) distribution for the Markov chain defined by q.

How well did you know this?

Not at all

Perfectly

What does it mean for a Markov chain to be Φ-irreducible?

For any set A with positive measure under Φ (Φ(A) > 0), there is a positive probability of reaching set A from any starting state x in a finite number of steps n. \nQⁿ(A|x) > 0 for some n > 0, for all x ∈ X.

How well did you know this?

Not at all

Perfectly

What does it mean for a Markov chain to be recurrent (specifically, Φ-recurrent)?

For any set A with Φ(A) > 0, if the chain starts in a state x typical under Φ (in a set of Φ-measure 1), it will visit A infinitely often (i.o.) almost surely. \nP(Xn ∈ A i.o. | X₀ = x) = 1, for x in a set B with Φ(B)=1.

How well did you know this?

Not at all

Perfectly

What is Harris recurrence, and how does it differ from standard recurrence?

Harris recurrence requires that P(Xn ∈ A i.o. | X₀ = x) = 1 for every possible starting state x ∈ X (not just almost every state under Φ), for any set A with Φ(A) > 0. It is a stronger condition.

How well did you know this?

Not at all

Perfectly

If a Markov chain is irreducible and has an invariant distribution Π, what properties follow (Theorem 1)?

(a) The chain is Π-irreducible and recurrent.\n- (b) Π is the unique invariant distribution.

How well did you know this?

Not at all

Perfectly

What is the Ergodic Theorem for MCMC estimators (Theorem 2)?

If {Xn} is an irreducible Markov chain with invariant distribution Π, and E<0xE2><0x82><0x99>[|h|] < ∞, then the Monte Carlo average h̄n = (1/n) Σi=1n h(Xᵢ) converges to the expectation E<0xE2><0x82><0x99>[h] almost surely with respect to Π, for Π-almost all starting points X₀ = x. \nP(limn→∞ h̄n = E<0xE2><0x82><0x99>[h] | X₀ = x) = 1, for Π-a.s. x.

How well did you know this?

Not at all

Perfectly

What does Total Variation (TV) convergence (||Qⁿ(·|x) - Π(·)||_TV → 0) mean for a Markov chain (Theorem 3)?

It means that the distribution of the chain’s state Xn starting from x, given by Qⁿ(·|x), becomes indistinguishable from the stationary distribution Π as n → ∞. This requires the chain to be irreducible and aperiodic.

How well did you know this?

Not at all

Perfectly

What is a Harris-ergodic Markov chain?

A chain that is irreducible, Harris-recurrent, and aperiodic.

How well did you know this?

Not at all

Perfectly

What is the implication of TV convergence everywhere (Theorem 4) for a Harris-ergodic chain?

Study These Flashcards

The marginal distribution νn(·) converges to the stationary distribution Π in Total Variation, regardless of the initial distribution ν₀. \nlimn→∞ ||νn(·) - Π(·)||_TV = 0.

What does it mean for a Markov chain to be geometrically ergodic?

Study These Flashcards

It’s a Harris-ergodic chain where the TV convergence to the stationary distribution happens at an exponential rate: \n||Qⁿ(· | x) – Π(·)||TV ≤ M(x)rⁿ for some function M(x) with finite expectation under Π, and some 0 < r < 1.

What is the Central Limit Theorem (CLT) for MCMC estimators (Theorem 5)?

Study These Flashcards

For a Harris-ergodic Markov chain under certain conditions (e.g., geometric ergodicity and moment conditions on h), the standardized sample mean converges in distribution to a standard Normal: \n√n (h̄n - E<0xE2><0x82><0x99>[h]) / sh → N(0,1).

What is the asymptotic variance s²h in the MCMC CLT? How does it differ from the i.i.d. case?

Study These Flashcards

s²h = Var<0xE2><0x82><0x99>[h(X₀)] + 2 Σk=1∞ Cov<0xE2><0x82><0x99>(h(X₀), h(Xk)). It includes the variance term (Var<0xE2><0x82><0x99>) plus terms accounting for the autocorrelation between samples in the chain. For i.i.d. samples, the covariance terms are zero.

What is the goal when designing an MCMC algorithm for a target distribution Π (with density π)?

Study These Flashcards

To construct a transition kernel Q such that:\n- Π is the invariant distribution of the resulting Markov chain.\n- The chain is ergodic (irreducible and aperiodic, preferably Harris-ergodic) to ensure convergence of estimators and the distribution itself to Π.

In the Metropolis-Hastings (MH) algorithm, what is the role of the proposal density q(y|x)?

Study These Flashcards

It’s an auxiliary density used to propose a candidate state X' based on the current state Xn-₁. It is not the final transition kernel of the MH Markov chain.

What is the acceptance probability α(x, y) in the Metropolis-Hastings algorithm?

Study These Flashcards

α(x, y) = min{1, [π(y)q(x|y)] / [π(x)q(y|x)]} (assuming the denominator is non-zero).

How is the acceptance probability α(x, y) calculated if the target density π is known only up to a normalizing constant Z (i.e., using the kernel π*)?

Study These Flashcards

Since π(y)/π(x) = π*(y)/π*(x), the unknown constant Z cancels out: \nα(x, y) = min{1, [π*(y)q(x|y)] / [π*(x)q(y|x)]}. The normalizing constant is not needed.

Describe the main steps within the loop (for n = 1 to N) of the Metropolis-Hastings algorithm.

Study These Flashcards

Propose: Sample a candidate X' ~ q(x' | Xn-₁). \n2. Calculate acceptance ratio: Compute α(Xn-₁, X'). \n3. Accept/Reject: Generate U ~ U[0,1]. If U ≤ α(Xn-₁, X'), set Xn = X'. Otherwise (if U > α), set Xn = Xn-₁ (reject X' and keep the previous state).

What is **burn-in** in MCMC?

Discarding the initial portion of the Markov chain samples (e.g., `X₀` to `X_B` for some `B`) because the chain may not have yet converged to its stationary distribution.

What does it mean if the MH acceptance rate is very **low**? What does it mean if it's very **high**?

*Low acceptance rate:* Often means the proposal `q` generates candidates `X'` far from `Xn-₁` in regions where π is low, leading to many rejections and high correlation (many duplicates `Xn = Xn-₁`). \n*High acceptance rate:* Often means the proposals are very close to `Xn-₁`, so `π(X') ≈ π(Xn-₁)` and `q(Xn-₁|X') ≈ q(X'|Xn-₁)`, leading to `α ≈ 1`. The chain moves very slowly and samples are highly correlated.

What is the **Metropolis algorithm**?

A special case of Metropolis-Hastings where the proposal density is **symmetric**: `q(y|x) = q(x|y)`. The acceptance probability simplifies to `α(x, y) = min{1, π(y)/π(x)}`.

What is a **Random Walk Metropolis** proposal?

A symmetric proposal where `X' = x + Z`, with `Z` drawn from a symmetric distribution `p` centered at 0 (e.g., `N(0, σ²)` or `U(-a, a)`). Then `q(y|x) = p(y-x)`.

What is an **Independence Sampler** (or Independence Chain) proposal?

A proposal where the candidate `X'` is drawn independently of the current state `x`: `X' ~ p`, so `q(y|x) = p(y)`. The acceptance probability is `α(x, y) = min{1, [π(y)p(x)] / [π(x)p(y)]}`.

week 11 Flashcards

(29 cards)