week 10 Flashcards
What is the primary goal of Monte Carlo integration in Bayesian inference?
To estimate posterior expectations of functions, E[h(θ)|x] = ∫ h(θ)π(θ|x)dθ
, which are often analytically intractable, especially in high dimensions.
Given n independent samples θ₁, ..., θn
from the posterior π(θ|x)
, what is the Monte Carlo estimate h̄n
of E[h(θ)|x]
?
h̄n = (1/n) Σ<sub>i=1</sub><sup>n</sup> h(θi)
According to the Weak Law of Large Numbers (WLLN), how does the Monte Carlo estimator h̄n
behave as n → ∞?
h̄n
converges in probability to E[h(θ)|x]
.
According to the Strong Law of Large Numbers (SLLN), if Var[h(θ)|x] < ∞
, how does the Monte Carlo estimator h̄n
behave as n → ∞?
h̄n
converges almost surely to E[h(θ)|x]
.
How is the variance of the Monte Carlo estimator h̄n
related to the variance of h(θ)
under the posterior?
Var[h̄n|x] = (1/n) * Var[h(θ)|x]
How can Var[h(θ)|x]
be estimated from the posterior samples θ₁, ..., θn
?
Using the sample variance: s²n ≈ (1/n) Σ<sub>i=1</sub><sup>n</sup> [h(θi) - h̄n]²
(Note: sometimes the 1/(n-1)
version is used for unbiasedness, but 1/n
is asymptotically equivalent and used in the notes).
What is the Monte Carlo error of the estimator h̄n
?
It is the estimated standard deviation of h̄n
: √(Var[h̄n(θ)|x]) = sn / √n
.
What does the Central Limit Theorem (CLT) state about the distribution of the Monte Carlo estimator h̄n
for large n?
√n * (h̄n - E[h(θ)|x]) / sn
converges in distribution to a standard Normal distribution, N(0,1)
.
What is the typical rate of convergence for the Monte Carlo error? How does this compare to methods like Simpson’s rule?
The Monte Carlo error converges at a rate of 1/√n
. This is slower than 1D numerical methods (like Simpson’s 1/n⁴
rate) but doesn’t degrade as badly with increasing dimension and requires fewer assumptions (only finite variance).
Describe the Inversion Method for generating a sample X
from a distribution with CDF F
.
- Generate
U ~ U[0,1]
. \n2. ComputeX = F⁻¹(U)
, whereF⁻¹
is the inverse CDF (quantile function).
What is the main requirement for using the Inversion Method?
The inverse CDF F⁻¹
must be known and computable, ideally in closed form.
Is the Inversion Method practical for high-dimensional distributions?
Generally no, due to the difficulty of deriving and inverting high-dimensional CDFs.
What are the three main requirements for the Accept-Reject Method to sample from a target density f
using a proposal density g
?
- The support of
f
must be contained within the support ofg
(supp(f) ⊂ supp(g)
).\n2. We must be able to generate samples fromg
.\n3. There must exist a constantM ≥ 1
such thatf(x) ≤ M * g(x)
for allx
.
Describe the steps of the standard Accept-Reject Algorithm (Algorithm 2).
- Generate a candidate
Y ~ g
.\n2. GenerateU ~ U[0,1]
.\n3. IfU ≤ f(Y) / (M * g(Y))
, acceptX = Y
.\n4. Otherwise, rejectY
and return to Step 1.
Describe the steps of the Alternative Accept-Reject Algorithm (Algorithm 3).
- Generate a candidate
Y ~ g
.\n2. GenerateU ~ U[0, M * g(Y)]
.\n3. IfU ≤ f(Y)
, acceptX = Y
.\n4. Otherwise, rejectY
and return to Step 1.
What is the probability of accepting a candidate Y
in the Accept-Reject method?
The acceptance probability is 1/M
.
What determines the efficiency of the Accept-Reject method? How can efficiency be improved?
Efficiency is determined by the acceptance probability 1/M
. Efficiency increases as M
gets closer to 1. This is achieved by choosing a proposal density g
that closely matches the shape of the target f
, minimizing the required envelope constant M
.
What is the optimal value M*
for the constant M
for a given proposal g
?
M* = sup<sub>x ∈ supp(f)</sub> [f(x) / g(x)]
.
Can Accept-Reject sampling be used if f
and g
are known only up to their normalizing constants (i.e., using kernels f̃
and g̃
)?
Yes. Find M̃
such that f̃(x) ≤ M̃ * g̃(x)
. Run the algorithm using f̃
, g̃
, and M̃
. The acceptance probability becomes a / (b * M̃)
, where a
and b
are the unknown normalizing constants of f
and g
.
Why does the efficiency of Accept-Reject sampling often deteriorate in high dimensions?
It becomes increasingly difficult to find a proposal g
that matches f
well everywhere, leading to a large required constant M
and thus a very low acceptance probability (the ‘curse of dimensionality’).
What is the core idea of Importance Sampling for estimating ∫ h(x)f(x)dx
?
Introduce an ‘importance density’ g(x)
from which we can sample, and rewrite the integral as ∫ [h(x) * f(x) / g(x)] g(x) dx
. This expectation is then estimated using samples from g
.
Given n i.i.d. samples X₁, ..., Xn
from the importance density g
, what is the basic Importance Sampling estimator for ∫ h(x)f(x)dx
?
(1/n) * Σ<sub>i=1</sub><sup>n</sup> [h(X_i) * w(X_i)]
, where w(X_i) = f(X_i) / g(X_i)
are the importance weights.
What is the key requirement on the support of the importance density g
relative to f
and h
?
The support of g
must cover all points x
where |h(x)|f(x) > 0
to avoid division by zero or infinite variance.
What is a guideline for choosing a good importance density g
to achieve low variance?
Choose g
such that the product |h(x)|f(x)/g(x)
is as constant as possible. Often, this means g
should resemble |h(x)|f(x)
and have heavier tails than f
.