week 11 Flashcards

1
Q

Why are methods like Markov Chain Monte Carlo (MCMC) needed in Bayesian analysis, especially compared to direct sampling methods?

A

Direct sampling algorithms often suffer from the curse of dimensionality and become inefficient or difficult to implement in high dimensions, where many real statistical problems lie. MCMC provides a way to sample from complex, high-dimensional posterior distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the fundamental property of a Markov chain X₀, X₁, ..., Xn, ...?

A

The Markov property: The distribution of the next state Xn depends only on the current state Xn-₁, not on the entire history X₀, ..., Xn-₁. \nP(Xn ∈ A | X₀, …, Xn-₁) = P(Xn ∈ A | Xn-₁)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a transition kernel Q(A|x) in a time-homogeneous Markov chain?

A

It defines the probability that the next state Xn will be in set A, given the current state Xn-₁ is x. \nQ(A|x) := P(Xn ∈ A | Xn-₁ = x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If the transition kernel Q has a density q, how is Q(A|x) expressed?

A

Q(A|x) = ∫<sub>A</sub> q(y|x) dy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is the marginal distribution of the state at step n, νn(A) = P(Xn ∈ A), calculated from the distribution at step n-1, νn-₁?

A

By integrating the transition kernel against the previous state’s distribution: \nνn(A) = ∫<sub><0xE2><0x84><0x9B><0xE2><0x84><0x9B></sub> Q(A|x) νn-₁(dx)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does it mean for a distribution Π (with pdf π) to be invariant (or stationary) for a Markov chain with transition kernel Q?

A

If the chain’s current state Xn is distributed according to Π, then the next state Xn+₁ will also be distributed according to Π. \nMathematically: Π(A) = ∫<sub><0xE2><0x84><0x9B><0xE2><0x84><0x9B></sub> Q(A|x) Π(dx) (or π(x) = ∫<sub><0xE2><0x84><0x9B><0xE2><0x84><0x9B></sub> q(x|y)π(y)dy using densities).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the detailed balance condition for a transition density q with respect to a distribution π?

A

q(y|x)π(x) = q(x|y)π(y) for all x, y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Does detailed balance imply stationarity?

A

Yes, detailed balance is a sufficient condition for π to be a stationary (invariant) distribution for the Markov chain defined by q.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does it mean for a Markov chain to be Φ-irreducible?

A

For any set A with positive measure under Φ (Φ(A) > 0), there is a positive probability of reaching set A from any starting state x in a finite number of steps n. \nQⁿ(A|x) > 0 for some n > 0, for all x ∈ X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does it mean for a Markov chain to be recurrent (specifically, Φ-recurrent)?

A

For any set A with Φ(A) > 0, if the chain starts in a state x typical under Φ (in a set of Φ-measure 1), it will visit A infinitely often (i.o.) almost surely. \nP(Xn ∈ A i.o. | X₀ = x) = 1, for x in a set B with Φ(B)=1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Harris recurrence, and how does it differ from standard recurrence?

A

Harris recurrence requires that P(Xn ∈ A i.o. | X₀ = x) = 1 for every possible starting state x ∈ X (not just almost every state under Φ), for any set A with Φ(A) > 0. It is a stronger condition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If a Markov chain is irreducible and has an invariant distribution Π, what properties follow (Theorem 1)?

A
  • (a) The chain is Π-irreducible and recurrent.\n- (b) Π is the unique invariant distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Ergodic Theorem for MCMC estimators (Theorem 2)?

A

If {Xn} is an irreducible Markov chain with invariant distribution Π, and E<0xE2><0x82><0x99>[|h|] < ∞, then the Monte Carlo average h̄n = (1/n) Σ<sub>i=1</sub><sup>n</sup> h(Xᵢ) converges to the expectation E<0xE2><0x82><0x99>[h] almost surely with respect to Π, for Π-almost all starting points X₀ = x. \nP(lim<sub>n→∞</sub> h̄n = E<0xE2><0x82><0x99>[h] | X₀ = x) = 1, for Π-a.s. x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does Total Variation (TV) convergence (||Qⁿ(·|x) - Π(·)||_TV → 0) mean for a Markov chain (Theorem 3)?

A

It means that the distribution of the chain’s state Xn starting from x, given by Qⁿ(·|x), becomes indistinguishable from the stationary distribution Π as n → ∞. This requires the chain to be irreducible and aperiodic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a Harris-ergodic Markov chain?

A

A chain that is irreducible, Harris-recurrent, and aperiodic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the implication of TV convergence everywhere (Theorem 4) for a Harris-ergodic chain?

A

The marginal distribution νn(·) converges to the stationary distribution Π in Total Variation, regardless of the initial distribution ν₀. \nlim<sub>n→∞</sub> ||νn(·) - Π(·)||_TV = 0.

17
Q

What does it mean for a Markov chain to be geometrically ergodic?

A

It’s a Harris-ergodic chain where the TV convergence to the stationary distribution happens at an exponential rate: \n||Qⁿ(· | x) – Π(·)||<sub>TV</sub> ≤ M(x)rⁿ for some function M(x) with finite expectation under Π, and some 0 < r < 1.

18
Q

What is the Central Limit Theorem (CLT) for MCMC estimators (Theorem 5)?

A

For a Harris-ergodic Markov chain under certain conditions (e.g., geometric ergodicity and moment conditions on h), the standardized sample mean converges in distribution to a standard Normal: \n√n (h̄n - E<0xE2><0x82><0x99>[h]) / sh → N(0,1).

19
Q

What is the asymptotic variance s²h in the MCMC CLT? How does it differ from the i.i.d. case?

A

s²h = Var<0xE2><0x82><0x99>[h(X₀)] + 2 Σ<sub>k=1</sub><sup>∞</sup> Cov<0xE2><0x82><0x99>(h(X₀), h(Xk)). It includes the variance term (Var<0xE2><0x82><0x99>) plus terms accounting for the autocorrelation between samples in the chain. For i.i.d. samples, the covariance terms are zero.

20
Q

What is the goal when designing an MCMC algorithm for a target distribution Π (with density π)?

A

To construct a transition kernel Q such that:\n- Π is the invariant distribution of the resulting Markov chain.\n- The chain is ergodic (irreducible and aperiodic, preferably Harris-ergodic) to ensure convergence of estimators and the distribution itself to Π.

21
Q

In the Metropolis-Hastings (MH) algorithm, what is the role of the proposal density q(y|x)?

A

It’s an auxiliary density used to propose a candidate state X' based on the current state Xn-₁. It is not the final transition kernel of the MH Markov chain.

22
Q

What is the acceptance probability α(x, y) in the Metropolis-Hastings algorithm?

A

α(x, y) = min{1, [π(y)q(x|y)] / [π(x)q(y|x)]} (assuming the denominator is non-zero).

23
Q

How is the acceptance probability α(x, y) calculated if the target density π is known only up to a normalizing constant Z (i.e., using the kernel π*)?

A

Since π(y)/π(x) = π*(y)/π*(x), the unknown constant Z cancels out: \nα(x, y) = min{1, [π*(y)q(x|y)] / [π*(x)q(y|x)]}. The normalizing constant is not needed.

24
Q

Describe the main steps within the loop (for n = 1 to N) of the Metropolis-Hastings algorithm.

A
  1. Propose: Sample a candidate X' ~ q(x' | Xn-₁). \n2. Calculate acceptance ratio: Compute α(Xn-₁, X'). \n3. Accept/Reject: Generate U ~ U[0,1]. If U ≤ α(Xn-₁, X'), set Xn = X'. Otherwise (if U > α), set Xn = Xn-₁ (reject X' and keep the previous state).
25
What is **burn-in** in MCMC?
Discarding the initial portion of the Markov chain samples (e.g., `X₀` to `XB` for some `B`) because the chain may not have yet converged to its stationary distribution.
26
What does it mean if the MH acceptance rate is very **low**? What does it mean if it's very **high**?
*Low acceptance rate:* Often means the proposal `q` generates candidates `X'` far from `Xn-₁` in regions where π is low, leading to many rejections and high correlation (many duplicates `Xn = Xn-₁`). \n*High acceptance rate:* Often means the proposals are very close to `Xn-₁`, so `π(X') ≈ π(Xn-₁)` and `q(Xn-₁|X') ≈ q(X'|Xn-₁)`, leading to `α ≈ 1`. The chain moves very slowly and samples are highly correlated.
27
What is the **Metropolis algorithm**?
A special case of Metropolis-Hastings where the proposal density is **symmetric**: `q(y|x) = q(x|y)`. The acceptance probability simplifies to `α(x, y) = min{1, π(y)/π(x)}`.
28
What is a **Random Walk Metropolis** proposal?
A symmetric proposal where `X' = x + Z`, with `Z` drawn from a symmetric distribution `p` centered at 0 (e.g., `N(0, σ²)` or `U(-a, a)`). Then `q(y|x) = p(y-x)`.
29
What is an **Independence Sampler** (or Independence Chain) proposal?
A proposal where the candidate `X'` is drawn independently of the current state `x`: `X' ~ p`, so `q(y|x) = p(y)`. The acceptance probability is `α(x, y) = min{1, [π(y)p(x)] / [π(x)p(y)]}`.