week 8 Flashcards
What is the goal of the Gibbs sampler algorithm?
To obtain a sample from a multivariate posterior distribution π(θ|x) where θ = (θ₁, …, θd).
Describe the initialization step (step 1) of the Gibbs sampler.
Initialize the parameter vector with starting values: θ⁽⁰⁾ = (θ₁⁽⁰⁾, …, θd⁽⁰⁾).
Describe the iterative sampling step (step 2) in the Gibbs sampler for θ = (θ₁, …, θd).
For iteration i = 1 to m: Simulate θ₁⁽ⁱ⁾ from the conditional π(θ₁|θ₂⁽ⁱ⁻¹⁾, …, θd⁽ⁱ⁻¹⁾, x). Then simulate θ₂⁽ⁱ⁾ from π(θ₂|θ₁⁽ⁱ⁾, θ₃⁽ⁱ⁻¹⁾, …, θd⁽ⁱ⁻¹⁾, x). Continue this process, simulating θj⁽ⁱ⁾ from π(θj | θ₁⁽ⁱ⁾, …, θ_{j-₁}⁽ⁱ⁾, θ_{j+₁}⁽ⁱ⁻¹⁾, …, θd⁽ⁱ⁻¹⁾, x), up to θd⁽ⁱ⁾ from π(θd | θ₁⁽ⁱ⁾, …, θ_{d-₁}⁽ⁱ⁾, x).
What constitutes one ‘sweep’ or ‘scan’ of the Gibbs sampler?
Completing the simulation of all components θ₁⁽ⁱ⁾ through θd⁽ⁱ⁾ for a single iteration i.
What is the final output of the Gibbs sampler algorithm after m iterations?
A collection of samples (θ⁽¹⁾, θ⁽²⁾, …, θ⁽ᵐ⁾), which represents draws from the target posterior distribution π(θ|x) after a suitable burn-in period.
When is the Gibbs sampler particularly useful?
When the full conditional distributions π(θj | θ_{-j}, x) are easy to sample from, often due to conditional conjugacy.
What are the three stages typically defined in a Bayesian hierarchical model?
Stage I: Data model/likelihood (x_i | θ_i ~ f(x_i | θ_i)). Stage II: Prior for parameters (θ_i | φ ~ π₀(θ_i | φ)). Stage III: Hyperprior for hyperparameters (φ ~ π₀(φ)).
In a hierarchical model, what does it mean for the parameters θ = (θ₁, …, θn) to be generated exchangeably?
They are assumed to be drawn from a common population distribution governed by a hyperparameter φ.
Write the proportionality relationship for the joint posterior distribution π(φ, θ | x) in a standard hierarchical model.
π(φ, θ | x) ∝ [Π_{i=1}^n f(x_i | θ_i) × π₀(θ_i | φ)] × π₀(φ).
What is the main idea behind using auxiliary variables in MCMC (like Gibbs sampling)?
To introduce additional variables U such that the joint distribution π(θ, u | x) has the target marginal π(θ|x), but the full conditionals π(θ | u, x) and π(u | θ, x) are easier to sample from.
What are two desired properties when introducing auxiliary variables U?
- The full conditionals π(θ | u, x) and π(u | θ, x) are straightforward to sample. 2. The introduction of U breaks complex dependence structures among the original variables θ.
In the context of a K-component finite mixture model fy(y|θ) = Σ_{k=1}^K ωk fk(y|θk), what auxiliary variables U₁, …, Un are introduced?
Discrete labels U_i indicating which component density (f₁, …, fK) generated the i-th data point y_i.
What is the distribution of the auxiliary variable U_i in the mixture model example?
U_i follows a categorical (or Multinomial(1, …)) distribution with P(U_i = k) = ωk for k = 1, …, K.
In the mixture model Gibbs sampler with auxiliary variables u, the conditional posterior π(θ|u, y) factorizes into which two independent parts?
It factorizes into π(θ₁, …, θK | u, y) and π(ω₁, …, ωK | u, y).
If the prior on θk factorizes as Π π₀(θk), how is the conditional posterior π(θk | u, y) updated for component k?
It depends only on the data points y_i for which u_i = k. Specifically, π(θk | u, y) ∝ [Π_{i: u_i=k} fk(yi | θk)] π₀(θk).
If the prior for the mixture weights ω = (ω₁, …, ωK) is Dirichlet(α₁, …, αK), what is the conditional posterior distribution π(ω | u, y)?
It is also a Dirichlet distribution: Dirichlet(n₁ + α₁, …, nK + αK), where nk = Σ_{i=1}^n 1{ui=k} is the count of data points assigned to component k.
What is the form of the conditional distribution π(u | θ, y) for the auxiliary variables in the mixture model?
It factorizes as π(u | θ, y) = Π_{i=1}^n π(u_i | θ, y_i).
How is the probability P(U_i = k | θ, y_i) calculated for the discrete auxiliary variable U_i in the mixture model?
Using Bayes’ theorem: P(U_i = k | θ, y_i) = [f(y_i | θk) * ωk] / [Σ_{j=1}^K f(y_i | θj) * ωj]. It’s a discrete probability distribution over {1, …, K}.