week 5 formatted Flashcards
How does the interpretation of a Bayesian credible interval differ from a frequentist confidence interval?
A 100(1-α)% credible interval (L
, U
) contains the parameter θ
with probability (1-α) given the observed data x, i.e., P(L ≤ θ ≤ U | x) = 1-α
. \nA frequentist confidence interval (L(X)
, U(X)
) is a random interval such that in the long run, 100(1-α)% of such intervals would contain the fixed, unknown true parameter value.
What is the mathematical definition of a 100(1-α)% Bayesian Credible Interval I = (L, U)
?
It is an interval such that the integral of the posterior density π(θ|x)
from L
to U
equals 1-α: ∫<sub>[L, U]</sub> π(θ|x) dθ = 1-α
.
Are Bayesian credible intervals unique for a given posterior distribution and confidence level α
?
No, there are generally infinitely many intervals (L
, U
) that satisfy the definition for a given α
and posterior.
What is a Quantile Credible Interval?
An interval (L
, U
) where L
is the α/2
quantile (Q<sub>α/2</sub>
) and U
is the 1-α/2
quantile (Q<sub>1-α/2</sub>
) of the posterior distribution π(θ|x)
.
What defines a Highest Posterior Density (HPD) credible interval (L
, U
)?
For any θ'
inside the interval [L
, U
] and any θ''
outside the interval, the posterior density at θ'
is greater than or equal to the posterior density at θ''
: π(θ'|x) ≥ π(θ''|x)
.
What is a key characteristic of HPD intervals compared to other credible intervals of the same size?
They are the shortest possible interval containing 100(1-α)% posterior probability.
If the posterior distribution is symmetric and unimodal (like a Normal distribution), how does the HPD interval relate to the quantile interval?
They coincide, and π(L|x) = π(U|x)
.
How does a quantile credible interval transform under a bijective (monotonic) transformation g(θ)
?
If [L
, U
] is a quantile interval for θ
, then [g(L)
, g(U)
] is the corresponding quantile interval for g(θ)
.
Is the same transformation property true for HPD intervals?
No, only if the transformation g
is linear. Otherwise, the HPD interval for g(θ)
must be recalculated.
In a multivariate setting with parameter θ = (δ, ξ)
, where δ
is the parameter of interest and ξ
is a nuisance parameter, how is the marginal posterior distribution for δ
, π(δ|x)
, obtained?
By integrating the joint posterior distribution π(δ, ξ|x)
over the nuisance parameter ξ
: π(δ|x) = ∫ π(δ, ξ|x) dξ
.
If we have N
samples θ¹, ..., θᴺ
drawn from the posterior distribution π(θ|x)
, how can we approximate the posterior mean E[θ|x]
?
Using the sample mean: (1/N) * Σ<sub>j=1</sub><sup>N</sup> θʲ
.
How can we approximate the posterior probability P(L ≤ θ ≤ U | x)
using N
posterior samples θ¹, ..., θᴺ
?
By calculating the proportion of samples that fall within the interval [L
, U
]: (1/N) * Σ<sub>j=1</sub><sup>N</sup> 1<sub>[L,U]</sub>(θʲ)
, where 1
is the indicator function.
How can Monte Carlo samples be used to estimate a quantile (e.g., the α
-th quantile) of the posterior distribution?
Draw N
samples, sort them, and find the sample corresponding to the desired quantile index (e.g., k ≈ ceil(α * N)
). (See Algorithm 1 in notes)
What general class of methods is mentioned for sampling from posterior distributions, especially when they are not standard or conjugate, often without needing the normalizing constant?
Markov Chain Monte Carlo (MCMC) methods (e.g., Metropolis-Hastings, Gibbs sampler).
If MCMC provides samples from the joint posterior π(δ, ξ|x)
, how can samples from the marginal posterior π(δ|x)
be obtained?
By simply discarding the ξ
components of the joint samples.
What is the Prior Predictive Distribution f(x*)
? What does it represent?
f(x*) = ∫ f(x*|θ) π₀(θ) dθ
. It represents the expected distribution of a new observation x*
before collecting any data (x
), averaging over the prior uncertainty about θ
.
What is the Posterior Predictive Distribution f(x*|x)
? What does it represent?
f(x*|x) = ∫ f(x*|θ) π(θ|x) dθ
. It represents the expected distribution of a new observation x*
after observing data x
, averaging over the remaining (posterior) uncertainty about θ
.
How is the posterior predictive distribution f(x*|x)
derived or justified intuitively?
It’s the likelihood f(x*|θ)
for a new observation, weighted by the posterior beliefs about θ
, π(θ|x)
, integrated over all possible θ
.