week 8 formatted Flashcards
What is the goal of the Gibbs sampler algorithm?
To obtain a sample from a multivariate posterior distribution π(θ|x)
where θ = (θ₁, ..., θd)
.
Describe the initialization step (step 1) of the Gibbs sampler.
Initialize the parameter vector with starting values: θ⁽⁰⁾ = (θ₁⁽⁰⁾, ..., θd⁽⁰⁾)
.
Describe the iterative sampling step (step 2) in the Gibbs sampler for θ = (θ₁, ..., θd)
.
For iteration i = 1 to m:\n- Simulate θ₁⁽ⁱ⁾
from the conditional π(θ₁|θ₂⁽ⁱ⁻¹⁾, ..., θd⁽ⁱ⁻¹⁾, x)
.\n- Simulate θ₂⁽ⁱ⁾
from π(θ₂|θ₁⁽ⁱ⁾, θ₃⁽ⁱ⁻¹⁾, ..., θd⁽ⁱ⁻¹⁾, x)
.\n- … Continue this process, simulating θj⁽ⁱ⁾
from π(θj | θ₁⁽ⁱ⁾, ..., θ<sub>j-₁</sub>⁽ⁱ⁾, θ<sub>j+₁</sub>⁽ⁱ⁻¹⁾, ..., θd⁽ⁱ⁻¹⁾, x)
.\n- … up to θd⁽ⁱ⁾
from π(θd | θ₁⁽ⁱ⁾, ..., θ<sub>d-₁</sub>⁽ⁱ⁾, x)
.
What constitutes one ‘sweep’ or ‘scan’ of the Gibbs sampler?
Completing the simulation of all components θ₁⁽ⁱ⁾
through θd⁽ⁱ⁾
for a single iteration i
.
What is the final output of the Gibbs sampler algorithm after m iterations?
A collection of samples (θ⁽¹⁾, θ⁽²⁾, ..., θ⁽ᵐ⁾)
, which represents draws from the target posterior distribution π(θ|x)
after a suitable burn-in period.
When is the Gibbs sampler particularly useful?
When the full conditional distributions π(θj | θ<sub>-j</sub>, x)
are easy to sample from (e.g., standard distributions), often due to conditional conjugacy.
What are the three stages typically defined in a Bayesian hierarchical model?
-
Stage I: Data model/likelihood (
x_i | θ_i ~ f(x_i | θ_i)
).\n* Stage II: Prior for parameters (θ_i | φ ~ π₀(θ_i | φ)
).\n* Stage III: Hyperprior for hyperparameters (φ ~ π₀(φ)
).
In a hierarchical model, what does it mean for the parameters θ = (θ₁, ..., θn)
to be generated exchangeably?
They are assumed to be drawn from a common population distribution governed by a hyperparameter φ
.
Write the proportionality relationship for the joint posterior distribution π(φ, θ | x)
in a standard hierarchical model.
π(φ, θ | x) ∝ [Π<sub>i=1</sub><sup>n</sup> f(x_i | θ_i) × π₀(θ_i | φ)] × π₀(φ)
.
What is the main idea behind using auxiliary variables in MCMC (like Gibbs sampling)?
To introduce additional variables U
such that the joint distribution π(θ, u | x)
has the target marginal π(θ|x)
, but the full conditionals π(θ | u, x)
and π(u | θ, x)
are easier to sample from.
What are two desired properties when introducing auxiliary variables U
?
- The full conditionals
π(θ | u, x)
andπ(u | θ, x)
are straightforward to sample.\n2. The introduction ofU
breaks complex dependence structures among the original variablesθ
.
In the context of a K-component finite mixture model fy(y|θ) = Σ<sub>k=1</sub><sup>K</sup> ωk fk(y|θk)
, what auxiliary variables U₁, ..., Un
are introduced?
Discrete labels U_i
indicating which component density (f₁, ..., fK
) generated the i-th data point y_i
.
What is the distribution of the auxiliary variable U_i
in the mixture model example?
U_i
follows a Categorical (or Multinomial(1, ω₁, ..., ωK)
) distribution with P(U_i = k) = ωk
for k = 1, ..., K
.
In the mixture model Gibbs sampler with auxiliary variables u
, the conditional posterior π(θ|u, y)
factorizes into which two independent parts?
It factorizes into π(θ₁, ..., θK | u, y)
and π(ω₁, ..., ωK | u, y)
.
If the prior on θk
factorizes as Π π₀(θk)
, how is the conditional posterior π(θk | u, y)
updated for component k
?
It depends only on the data points y_i
for which u_i = k
. Specifically, π(θk | u, y) ∝ [Π<sub>i: u_i=k</sub> fk(yi | θk)] π₀(θk)
.
If the prior for the mixture weights ω = (ω₁, ..., ωK)
is Dirichlet(α₁, ..., αK)
, what is the conditional posterior distribution π(ω | u, y)
?
It is also a Dirichlet distribution: Dirichlet(n₁ + α₁, ..., nK + αK)
, where nk = Σ<sub>i=1</sub><sup>n</sup> 1{ui=k}
is the count of data points assigned to component k
.
What is the form of the conditional distribution π(u | θ, y)
for the auxiliary variables in the mixture model?
It factorizes as π(u | θ, y) = Π<sub>i=1</sub><sup>n</sup> π(u_i | θ, y_i)
.
How is the probability P(U_i = k | θ, y_i)
calculated for the discrete auxiliary variable U_i
in the mixture model?
Using Bayes’ theorem: P(U_i = k | θ, y_i) = [f(y_i | θk) * ωk] / [Σ<sub>j=1</sub><sup>K</sup> f(y_i | θj) * ωj]
. It’s a discrete probability distribution over {1, ..., K}
.