week 9 formatted Flashcards
Define the Bayes Factor B₀₁
in terms of posterior and prior odds for models M₀
and M₁
given data x
.
B₀₁(x) = [P(M₀|x) / P(M₁|x)] / [π₀(M₀) / π₀(M₁)]
Define the Bayes Factor B₀₁
in terms of marginal likelihoods f(x|Mⱼ)
.
B₀₁(x) = f(x|M₀) / f(x|M₁)
What does the quantity f(x|Mⱼ)
represent?
It is the marginal likelihood (or evidence) of the data x
under model Mⱼ
, calculated as ∫ L(θⱼ|x, Mⱼ) π₀(θⱼ|Mⱼ) dθⱼ
.
What does a Bayes Factor B₀₁(x) > 1
signify?
The data x
provides evidence in favour of model M₀
over model M₁
. The relative plausibility of M₀
has increased.
What does a Bayes Factor B₀₁(x) < 1
signify?
The data x
provides evidence in favour of model M₁
over model M₀
. The relative plausibility of M₁
has increased.
According to the interpretation table provided (Kass and Raftery), what range for the Bayes factor B₀₁
corresponds to ‘Strong’ evidence against H₁
(in favour of H₀
)?
20 - 150
According to the interpretation table, what range for 2 log(B₀₁)
corresponds to ‘Positive’ evidence against H₁
?
2 - 6
If comparing k candidate models {Mⱼ}
, how can the posterior probability P(Mⱼ|x)
be calculated?
P(Mⱼ|x) = [f(x|Mⱼ)π₀(Mⱼ)] / [Σ<sub>i=1</sub><sup>k</sup> f(x|Mᵢ)π₀(Mᵢ)]
, where π₀(Mⱼ)
is the prior probability of model Mⱼ
.
For the parametric model F(x|θ)
, write the Bayes Factor B₀₁
for testing the simple hypothesis H₀: θ = θ₀
versus the simple alternative H₁: θ = θ₁
.
B₀₁(x) = f(x|θ₀) / f(x|θ₁)
For the parametric model F(x|θ)
, write the Bayes Factor B₀₁
for testing the simple hypothesis H₀: θ = θ₀
versus the composite alternative H₁: θ ≠ θ₀
.
B₀₁(x) = f(x|θ₀) / ∫<sub>Θ-{θ₀}</sub> f(x|θ)π₀,₁(θ)dθ
, where π₀,₁
is the prior under H₁
.
What approximation is used as the basis for deriving the Bayesian Information Criterion (BIC)?
The Laplace approximation of the marginal likelihood f(x)
.
State the formula for the Bayesian Information Criterion (BIC).
BIC = -2l(θ̂<sub>MLE</sub>) + p log n
In the BIC formula, what do l(θ̂<sub>MLE</sub>)
, p
, and n
represent?
l(θ̂<sub>MLE</sub>)
is the maximized log-likelihood value, p
is the number of parameters in the model, and n
is the sample size.
When comparing several models, how is BIC used for model selection?
The model with the lowest BIC value is preferred.
What quantity does BIC provide an approximation for?
BIC approximates -2 log(f(x))
, where f(x)
is the marginal likelihood of the data.
In the derivation of BIC from the Laplace approximation of f(x)
, why are certain terms involving (2π)
and the determinant of Fisher information often ignored?
These terms form a constant C
that does not depend significantly on the model structure (beyond p
) and is negligible compared to -2l(θ̂<sub>MLE</sub>)
and p log n
for large n
, especially when comparing models.