week 6 formatted Flashcards

1
Q

In the context of Bayesian asymptotics, what does it mean for a posterior distribution π(θ|xn) to be consistent?

A

It means that as the sample size n increases, the posterior distribution concentrates its mass arbitrarily close to the true parameter value θ₀. Formally, π(θ|xn) converges weakly to δ<sub>θ₀</sub>(θ) (a point mass at θ₀) P<sub>θ₀</sub>-almost surely.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

State the intuitive meaning of posterior consistency using neighborhoods.

A

For any neighborhood U(θ₀) around the true parameter θ₀, the posterior probability of θ lying within that neighborhood converges to 1 as n → ∞ (P<sub>θ₀</sub>-a.s.). That is, ∫<sub>U(θ₀)</sub> π(θ|xn)dθ → 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What condition does Doob’s Theorem require for posterior consistency?

A

It requires the statistical model to be identifiable (i.e., P<sub>θ</sub> ≠ P<sub>θ'</sub> if θ ≠ θ').

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does Doob’s Theorem guarantee about posterior consistency?

A

It guarantees that the posterior distribution π(θ|xn) will be consistent for all θ₀ in a set Θ₀ that has full measure under the prior π₀ (i.e., ∫<sub>Θ₀</sub> π₀(θ)dθ = 1). Consistency holds except possibly on a set of θ values with prior measure zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a practical criticism of Doob’s Theorem?

A

It guarantees consistency only up to a set of prior measure zero. This set could still be large or relevant (e.g., all non-zero values) if the prior is concentrated elsewhere (e.g., at zero), making the ‘technical’ consistency potentially misleading.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can posterior consistency be checked using the posterior mean and variance?

A

If E[θ|xn] → θ₀ and Var[θ|xn] → 0 as n → ∞ (P<sub>θ₀</sub>-a.s.), then the posterior is consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the convergence in Total Variation (TV) distance between posteriors derived from different priors (π₁ and π₂) imply asymptotically?

A

It implies that ||π₁(θ|xn) - π₂(θ|xn)||<sub>TV</sub> → 0 as n → ∞ (P<sub>θ₀</sub>-a.s.), provided both priors assign positive mass to the true θ₀. This shows the influence of the prior diminishes as sample size increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the typical limiting distribution of the Maximum Likelihood Estimator (MLE) θ̂<sub>n</sub><sup>ML</sup> in frequentist statistics?

A

√n (θ̂<sub>n</sub><sup>ML</sup> - θ₀) converges in distribution to a Normal distribution N(0, I(θ₀)⁻¹), where I(θ₀) is the Fisher Information matrix at the true value θ₀.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the main result of the Bernstein-von Mises (BvM) theorem regarding the asymptotic shape of the posterior distribution?

A

Under regularity conditions, the posterior distribution π(θ|xn), when properly centered and scaled, converges to a Normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

According to BvM, what Normal distribution approximates the posterior distribution of θ|xn for large n?

A

θ|xn ≈ N<sub>p</sub>(θ̂<sub>n</sub><sup>ML</sup>, [Î<sub>n</sub>(θ̂<sub>n</sub><sup>ML</sup>)]⁻¹), where θ̂<sub>n</sub><sup>ML</sup> is the MLE and Î<sub>n</sub>(θ̂<sub>n</sub><sup>ML</sup>) is the observed Fisher information matrix evaluated at the MLE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the BvM theorem relate Bayesian credible sets and frequentist confidence intervals?

A

It implies that for large n, Bayesian credible sets and frequentist confidence intervals (based on the MLE) tend to coincide, suggesting Bayesian inference is asymptotically calibrated from a frequentist perspective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Does the prior π₀(θ) influence the limiting Normal distribution in the BvM theorem?

A

No, the prior term vanishes asymptotically, showing the diminishing influence of the prior as data accumulates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Laplace Approximation used for?

A

It provides an analytical approximation to the marginal likelihood m(xn) = ∫ L(θ,xn)π₀(θ) dθ, which is often intractable, especially in high dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define the ‘energy function’ g(θ) in the context of Laplace Approximation.

A

g(θ) = -log[L(θ, xn)π₀(θ)], i.e., the negative logarithm of the unnormalized posterior density (kernel).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the core idea behind the Laplace Approximation of m(xn)?

A

Approximate g(θ) by a quadratic function (its second-order Taylor expansion) around its minimum θ̃ (which is the posterior mode), then integrate the resulting Gaussian function analytically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Let θ̃ be the posterior mode (minimizing g(θ)) and H(θ̃) = ∇²g(θ̃) be the Hessian matrix at the mode. What is the Laplace approximation formula for m(xn)?

A

m(xn) ≈ exp{-g(θ̃)} * (2π)<sup>p/2</sup> * det[H(θ̃)]<sup>-1/2</sup> \n= L(θ̃, xn)π₀(θ̃) * (2π)<sup>p/2</sup> * det[H(θ̃)]<sup>-1/2</sup>

17
Q

What condition makes the Laplace approximation accurate?

A

The approximation works well when the posterior distribution is unimodal and well-approximated by a Gaussian distribution, typically occurring for large sample sizes (due to BvM).

18
Q

What quantity does the Hessian H(θ̃) in the Laplace approximation correspond to?

A

It is the observed information matrix evaluated at the posterior mode θ̃ (also called the generalized observed information matrix if the prior is included).