week 8 formatted Flashcards

1
Q

What is the goal of the Gibbs sampler algorithm?

A

To obtain a sample from a multivariate posterior distribution π(θ|x) where θ = (θ₁, ..., θd).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the initialization step (step 1) of the Gibbs sampler.

A

Initialize the parameter vector with starting values: θ⁽⁰⁾ = (θ₁⁽⁰⁾, ..., θd⁽⁰⁾).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the iterative sampling step (step 2) in the Gibbs sampler for θ = (θ₁, ..., θd).

A

For iteration i = 1 to m:\n- Simulate θ₁⁽ⁱ⁾ from the conditional π(θ₁|θ₂⁽ⁱ⁻¹⁾, ..., θd⁽ⁱ⁻¹⁾, x).\n- Simulate θ₂⁽ⁱ⁾ from π(θ₂|θ₁⁽ⁱ⁾, θ₃⁽ⁱ⁻¹⁾, ..., θd⁽ⁱ⁻¹⁾, x).\n- … Continue this process, simulating θj⁽ⁱ⁾ from π(θj | θ₁⁽ⁱ⁾, ..., θ<sub>j-₁</sub>⁽ⁱ⁾, θ<sub>j+₁</sub>⁽ⁱ⁻¹⁾, ..., θd⁽ⁱ⁻¹⁾, x).\n- … up to θd⁽ⁱ⁾ from π(θd | θ₁⁽ⁱ⁾, ..., θ<sub>d-₁</sub>⁽ⁱ⁾, x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What constitutes one ‘sweep’ or ‘scan’ of the Gibbs sampler?

A

Completing the simulation of all components θ₁⁽ⁱ⁾ through θd⁽ⁱ⁾ for a single iteration i.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the final output of the Gibbs sampler algorithm after m iterations?

A

A collection of samples (θ⁽¹⁾, θ⁽²⁾, ..., θ⁽ᵐ⁾), which represents draws from the target posterior distribution π(θ|x) after a suitable burn-in period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is the Gibbs sampler particularly useful?

A

When the full conditional distributions π(θj | θ<sub>-j</sub>, x) are easy to sample from (e.g., standard distributions), often due to conditional conjugacy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the three stages typically defined in a Bayesian hierarchical model?

A
  • Stage I: Data model/likelihood (x_i | θ_i ~ f(x_i | θ_i)).\n* Stage II: Prior for parameters (θ_i | φ ~ π₀(θ_i | φ)).\n* Stage III: Hyperprior for hyperparameters (φ ~ π₀(φ)).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a hierarchical model, what does it mean for the parameters θ = (θ₁, ..., θn) to be generated exchangeably?

A

They are assumed to be drawn from a common population distribution governed by a hyperparameter φ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Write the proportionality relationship for the joint posterior distribution π(φ, θ | x) in a standard hierarchical model.

A

π(φ, θ | x) ∝ [Π<sub>i=1</sub><sup>n</sup> f(x_i | θ_i) × π₀(θ_i | φ)] × π₀(φ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the main idea behind using auxiliary variables in MCMC (like Gibbs sampling)?

A

To introduce additional variables U such that the joint distribution π(θ, u | x) has the target marginal π(θ|x), but the full conditionals π(θ | u, x) and π(u | θ, x) are easier to sample from.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are two desired properties when introducing auxiliary variables U?

A
  1. The full conditionals π(θ | u, x) and π(u | θ, x) are straightforward to sample.\n2. The introduction of U breaks complex dependence structures among the original variables θ.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In the context of a K-component finite mixture model fy(y|θ) = Σ<sub>k=1</sub><sup>K</sup> ωk fk(y|θk), what auxiliary variables U₁, ..., Un are introduced?

A

Discrete labels U_i indicating which component density (f₁, ..., fK) generated the i-th data point y_i.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the distribution of the auxiliary variable U_i in the mixture model example?

A

U_i follows a Categorical (or Multinomial(1, ω₁, ..., ωK)) distribution with P(U_i = k) = ωk for k = 1, ..., K.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In the mixture model Gibbs sampler with auxiliary variables u, the conditional posterior π(θ|u, y) factorizes into which two independent parts?

A

It factorizes into π(θ₁, ..., θK | u, y) and π(ω₁, ..., ωK | u, y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If the prior on θk factorizes as Π π₀(θk), how is the conditional posterior π(θk | u, y) updated for component k?

A

It depends only on the data points y_i for which u_i = k. Specifically, π(θk | u, y) ∝ [Π<sub>i: u_i=k</sub> fk(yi | θk)] π₀(θk).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If the prior for the mixture weights ω = (ω₁, ..., ωK) is Dirichlet(α₁, ..., αK), what is the conditional posterior distribution π(ω | u, y)?

A

It is also a Dirichlet distribution: Dirichlet(n₁ + α₁, ..., nK + αK), where nk = Σ<sub>i=1</sub><sup>n</sup> 1{ui=k} is the count of data points assigned to component k.

17
Q

What is the form of the conditional distribution π(u | θ, y) for the auxiliary variables in the mixture model?

A

It factorizes as π(u | θ, y) = Π<sub>i=1</sub><sup>n</sup> π(u_i | θ, y_i).

18
Q

How is the probability P(U_i = k | θ, y_i) calculated for the discrete auxiliary variable U_i in the mixture model?

A

Using Bayes’ theorem: P(U_i = k | θ, y_i) = [f(y_i | θk) * ωk] / [Σ<sub>j=1</sub><sup>K</sup> f(y_i | θj) * ωj]. It’s a discrete probability distribution over {1, ..., K}.