week 10 Flashcards

1
Q

What is the primary goal of Monte Carlo integration in Bayesian inference?

A

To estimate posterior expectations of functions, E[h(θ)|x] = ∫ h(θ)π(θ|x)dθ, which are often analytically intractable, especially in high dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Given n independent samples θ₁, ..., θn from the posterior π(θ|x), what is the Monte Carlo estimate h̄n of E[h(θ)|x]?

A

h̄n = (1/n) Σ<sub>i=1</sub><sup>n</sup> h(θi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

According to the Weak Law of Large Numbers (WLLN), how does the Monte Carlo estimator h̄n behave as n → ∞?

A

h̄n converges in probability to E[h(θ)|x].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

According to the Strong Law of Large Numbers (SLLN), if Var[h(θ)|x] < ∞, how does the Monte Carlo estimator h̄n behave as n → ∞?

A

h̄n converges almost surely to E[h(θ)|x].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is the variance of the Monte Carlo estimator h̄n related to the variance of h(θ) under the posterior?

A

Var[h̄n|x] = (1/n) * Var[h(θ)|x]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can Var[h(θ)|x] be estimated from the posterior samples θ₁, ..., θn?

A

Using the sample variance: s²n ≈ (1/n) Σ<sub>i=1</sub><sup>n</sup> [h(θi) - h̄n]² (Note: sometimes the 1/(n-1) version is used for unbiasedness, but 1/n is asymptotically equivalent and used in the notes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Monte Carlo error of the estimator h̄n?

A

It is the estimated standard deviation of h̄n: √(Var[h̄n(θ)|x]) = sn / √n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the Central Limit Theorem (CLT) state about the distribution of the Monte Carlo estimator h̄n for large n?

A

√n * (h̄n - E[h(θ)|x]) / sn converges in distribution to a standard Normal distribution, N(0,1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the typical rate of convergence for the Monte Carlo error? How does this compare to methods like Simpson’s rule?

A

The Monte Carlo error converges at a rate of 1/√n. This is slower than 1D numerical methods (like Simpson’s 1/n⁴ rate) but doesn’t degrade as badly with increasing dimension and requires fewer assumptions (only finite variance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the Inversion Method for generating a sample X from a distribution with CDF F.

A
  1. Generate U ~ U[0,1]. \n2. Compute X = F⁻¹(U), where F⁻¹ is the inverse CDF (quantile function).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main requirement for using the Inversion Method?

A

The inverse CDF F⁻¹ must be known and computable, ideally in closed form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Is the Inversion Method practical for high-dimensional distributions?

A

Generally no, due to the difficulty of deriving and inverting high-dimensional CDFs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the three main requirements for the Accept-Reject Method to sample from a target density f using a proposal density g?

A
  1. The support of f must be contained within the support of g (supp(f) ⊂ supp(g)).\n2. We must be able to generate samples from g.\n3. There must exist a constant M ≥ 1 such that f(x) ≤ M * g(x) for all x.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the steps of the standard Accept-Reject Algorithm (Algorithm 2).

A
  1. Generate a candidate Y ~ g.\n2. Generate U ~ U[0,1].\n3. If U ≤ f(Y) / (M * g(Y)), accept X = Y.\n4. Otherwise, reject Y and return to Step 1.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the steps of the Alternative Accept-Reject Algorithm (Algorithm 3).

A
  1. Generate a candidate Y ~ g.\n2. Generate U ~ U[0, M * g(Y)].\n3. If U ≤ f(Y), accept X = Y.\n4. Otherwise, reject Y and return to Step 1.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the probability of accepting a candidate Y in the Accept-Reject method?

A

The acceptance probability is 1/M.

17
Q

What determines the efficiency of the Accept-Reject method? How can efficiency be improved?

A

Efficiency is determined by the acceptance probability 1/M. Efficiency increases as M gets closer to 1. This is achieved by choosing a proposal density g that closely matches the shape of the target f, minimizing the required envelope constant M.

18
Q

What is the optimal value M* for the constant M for a given proposal g?

A

M* = sup<sub>x ∈ supp(f)</sub> [f(x) / g(x)].

19
Q

Can Accept-Reject sampling be used if f and g are known only up to their normalizing constants (i.e., using kernels and )?

A

Yes. Find such that f̃(x) ≤ M̃ * g̃(x). Run the algorithm using , , and . The acceptance probability becomes a / (b * M̃), where a and b are the unknown normalizing constants of f and g.

20
Q

Why does the efficiency of Accept-Reject sampling often deteriorate in high dimensions?

A

It becomes increasingly difficult to find a proposal g that matches f well everywhere, leading to a large required constant M and thus a very low acceptance probability (the ‘curse of dimensionality’).

21
Q

What is the core idea of Importance Sampling for estimating ∫ h(x)f(x)dx?

A

Introduce an ‘importance density’ g(x) from which we can sample, and rewrite the integral as ∫ [h(x) * f(x) / g(x)] g(x) dx. This expectation is then estimated using samples from g.

22
Q

Given n i.i.d. samples X₁, ..., Xn from the importance density g, what is the basic Importance Sampling estimator for ∫ h(x)f(x)dx?

A

(1/n) * Σ<sub>i=1</sub><sup>n</sup> [h(X_i) * w(X_i)], where w(X_i) = f(X_i) / g(X_i) are the importance weights.

23
Q

What is the key requirement on the support of the importance density g relative to f and h?

A

The support of g must cover all points x where |h(x)|f(x) > 0 to avoid division by zero or infinite variance.

24
Q

What is a guideline for choosing a good importance density g to achieve low variance?

A

Choose g such that the product |h(x)|f(x)/g(x) is as constant as possible. Often, this means g should resemble |h(x)|f(x) and have heavier tails than f.

25
What is **Self-Normalized Importance Sampling** used for?
When the importance weights `w(x) = f(x)/g(x)` can only be calculated up to an unknown normalizing constant (e.g., `f` and/or `g` are unnormalized densities).
26
What is the formula for the Self-Normalized Importance Sampling estimator?
`[Σi=1n h(X_i) * w(X_i)] / [Σi=1n w(X_i)]`, where `X_i ~ g` and `w(X_i)` are the (possibly unnormalized) importance weights.
27
What is a key difference between the basic IS estimator and the self-normalized IS estimator regarding bias?
The basic IS estimator is unbiased. The self-normalized IS estimator is biased for finite *n*, but it is asymptotically unbiased (consistent).
28
What stricter condition on supports is required for self-normalized importance sampling compared to basic importance sampling?
`supp(f) ⊂ supp(g)` is required for self-normalized IS.
29
What is the goal of **Sampling Importance Resampling (SIR)**?
To convert weighted samples obtained from importance sampling into an approximately unweighted sample from the target distribution `f`, primarily to reduce variance caused by highly variable weights.
30
Describe the three main steps of the **Sampling Importance Resampling (SIR)** algorithm.
1. **Sample:** Draw `X₁, ..., Xn ~ g` from the proposal distribution.\n2. **Weight:** Calculate importance weights `w(X_i) = f(X_i) / g(X_i)` and normalize them: `w̃(X_i) = w(X_i) / Σj=1n w(X_j)`.\n3. **Resample:** Draw `Y₁, ..., Yn` by sampling *with replacement* from the set `{X₁, ..., Xn}` according to the normalized weights `w̃(X_i)`.
31
What is the potential problem with SIR, especially when the importance weights are very uneven?
**Sample depletion** (or impoverishment): The resampled set `{Y₁, ..., Yn}` may contain many duplicates of only a few original samples `X_i` that had very high weights, leading to a loss of diversity and potentially poor estimates.