week 10 Flashcards by tyrion lannister

What is the primary goal of Monte Carlo integration in Bayesian inference?

To estimate posterior expectations of functions, E[h(θ)|x] = ∫ h(θ)π(θ|x)dθ, which are often analytically intractable, especially in high dimensions.

How well did you know this?

Not at all

Perfectly

Given n independent samples θ₁, ..., θn from the posterior π(θ|x), what is the Monte Carlo estimate h̄n of E[h(θ)|x]?

h̄n = (1/n) Σi=1n h(θi)

How well did you know this?

Not at all

Perfectly

According to the Weak Law of Large Numbers (WLLN), how does the Monte Carlo estimator h̄n behave as n → ∞?

h̄n converges in probability to E[h(θ)|x].

How well did you know this?

Not at all

Perfectly

According to the Strong Law of Large Numbers (SLLN), if Var[h(θ)|x] < ∞, how does the Monte Carlo estimator h̄n behave as n → ∞?

h̄n converges almost surely to E[h(θ)|x].

How well did you know this?

Not at all

Perfectly

How is the variance of the Monte Carlo estimator h̄n related to the variance of h(θ) under the posterior?

Var[h̄n|x] = (1/n) * Var[h(θ)|x]

How well did you know this?

Not at all

Perfectly

How can Var[h(θ)|x] be estimated from the posterior samples θ₁, ..., θn?

Using the sample variance: s²n ≈ (1/n) Σi=1n [h(θi) - h̄n]² (Note: sometimes the 1/(n-1) version is used for unbiasedness, but 1/n is asymptotically equivalent and used in the notes).

How well did you know this?

Not at all

Perfectly

What is the Monte Carlo error of the estimator h̄n?

It is the estimated standard deviation of h̄n: √(Var[h̄n(θ)|x]) = sn / √n.

How well did you know this?

Not at all

Perfectly

What does the Central Limit Theorem (CLT) state about the distribution of the Monte Carlo estimator h̄n for large n?

√n * (h̄n - E[h(θ)|x]) / sn converges in distribution to a standard Normal distribution, N(0,1).

How well did you know this?

Not at all

Perfectly

What is the typical rate of convergence for the Monte Carlo error? How does this compare to methods like Simpson’s rule?

The Monte Carlo error converges at a rate of 1/√n. This is slower than 1D numerical methods (like Simpson’s 1/n⁴ rate) but doesn’t degrade as badly with increasing dimension and requires fewer assumptions (only finite variance).

How well did you know this?

Not at all

Perfectly

Describe the Inversion Method for generating a sample X from a distribution with CDF F.

Generate U ~ U[0,1]. \n2. Compute X = F⁻¹(U), where F⁻¹ is the inverse CDF (quantile function).

How well did you know this?

Not at all

Perfectly

What is the main requirement for using the Inversion Method?

The inverse CDF F⁻¹ must be known and computable, ideally in closed form.

How well did you know this?

Not at all

Perfectly

Is the Inversion Method practical for high-dimensional distributions?

Generally no, due to the difficulty of deriving and inverting high-dimensional CDFs.

How well did you know this?

Not at all

Perfectly

What are the three main requirements for the Accept-Reject Method to sample from a target density f using a proposal density g?

The support of f must be contained within the support of g (supp(f) ⊂ supp(g)).\n2. We must be able to generate samples from g.\n3. There must exist a constant M ≥ 1 such that f(x) ≤ M * g(x) for all x.

How well did you know this?

Not at all

Perfectly

Describe the steps of the standard Accept-Reject Algorithm (Algorithm 2).

Generate a candidate Y ~ g.\n2. Generate U ~ U[0,1].\n3. If U ≤ f(Y) / (M * g(Y)), accept X = Y.\n4. Otherwise, reject Y and return to Step 1.

How well did you know this?

Not at all

Perfectly

Describe the steps of the Alternative Accept-Reject Algorithm (Algorithm 3).

Generate a candidate Y ~ g.\n2. Generate U ~ U[0, M * g(Y)].\n3. If U ≤ f(Y), accept X = Y.\n4. Otherwise, reject Y and return to Step 1.

How well did you know this?

Not at all

Perfectly

What is the probability of accepting a candidate Y in the Accept-Reject method?

Study These Flashcards

The acceptance probability is 1/M.

What determines the efficiency of the Accept-Reject method? How can efficiency be improved?

Study These Flashcards

Efficiency is determined by the acceptance probability 1/M. Efficiency increases as M gets closer to 1. This is achieved by choosing a proposal density g that closely matches the shape of the target f, minimizing the required envelope constant M.

What is the optimal value M* for the constant M for a given proposal g?

Study These Flashcards

M* = supx ∈ supp(f) [f(x) / g(x)].

Can Accept-Reject sampling be used if f and g are known only up to their normalizing constants (i.e., using kernels f̃ and g̃)?

Study These Flashcards

Yes. Find M̃ such that f̃(x) ≤ M̃ * g̃(x). Run the algorithm using f̃, g̃, and M̃. The acceptance probability becomes a / (b * M̃), where a and b are the unknown normalizing constants of f and g.

Why does the efficiency of Accept-Reject sampling often deteriorate in high dimensions?

Study These Flashcards

It becomes increasingly difficult to find a proposal g that matches f well everywhere, leading to a large required constant M and thus a very low acceptance probability (the ‘curse of dimensionality’).

What is the core idea of Importance Sampling for estimating ∫ h(x)f(x)dx?

Study These Flashcards

Introduce an ‘importance density’ g(x) from which we can sample, and rewrite the integral as ∫ [h(x) * f(x) / g(x)] g(x) dx. This expectation is then estimated using samples from g.

Given n i.i.d. samples X₁, ..., Xn from the importance density g, what is the basic Importance Sampling estimator for ∫ h(x)f(x)dx?

Study These Flashcards

(1/n) * Σi=1n [h(X_i) * w(X_i)], where w(X_i) = f(X_i) / g(X_i) are the importance weights.

What is the key requirement on the support of the importance density g relative to f and h?

Study These Flashcards

The support of g must cover all points x where |h(x)|f(x) > 0 to avoid division by zero or infinite variance.

What is a guideline for choosing a good importance density g to achieve low variance?

Study These Flashcards

Choose g such that the product |h(x)|f(x)/g(x) is as constant as possible. Often, this means g should resemble |h(x)|f(x) and have heavier tails than f.

What is **Self-Normalized Importance Sampling** used for?

When the importance weights `w(x) = f(x)/g(x)` can only be calculated up to an unknown normalizing constant (e.g., `f` and/or `g` are unnormalized densities).

What is the formula for the Self-Normalized Importance Sampling estimator?

`[Σ_i=1ⁿ h(X_i) * w(X_i)] / [Σ_i=1ⁿ w(X_i)]`, where `X_i ~ g` and `w(X_i)` are the (possibly unnormalized) importance weights.

What is a key difference between the basic IS estimator and the self-normalized IS estimator regarding bias?

The basic IS estimator is unbiased. The self-normalized IS estimator is biased for finite *n*, but it is asymptotically unbiased (consistent).

What stricter condition on supports is required for self-normalized importance sampling compared to basic importance sampling?

`supp(f) ⊂ supp(g)` is required for self-normalized IS.

What is the goal of **Sampling Importance Resampling (SIR)**?

To convert weighted samples obtained from importance sampling into an approximately unweighted sample from the target distribution `f`, primarily to reduce variance caused by highly variable weights.

Describe the three main steps of the **Sampling Importance Resampling (SIR)** algorithm.

1. **Sample:** Draw `X₁, ..., Xn ~ g` from the proposal distribution.\n2. **Weight:** Calculate importance weights `w(X_i) = f(X_i) / g(X_i)` and normalize them: `w̃(X_i) = w(X_i) / Σ_j=1ⁿ w(X_j)`.\n3. **Resample:** Draw `Y₁, ..., Yn` by sampling *with replacement* from the set `{X₁, ..., Xn}` according to the normalized weights `w̃(X_i)`.

What is the potential problem with SIR, especially when the importance weights are very uneven?

**Sample depletion** (or impoverishment): The resampled set `{Y₁, ..., Yn}` may contain many duplicates of only a few original samples `X_i` that had very high weights, leading to a loss of diversity and potentially poor estimates.

week 10 Flashcards

(31 cards)