week 4 formatted Flashcards
What principle states that if two experiments x
and y
yield proportional likelihood functions for θ
(L(θ|x) ∝ L(θ|y)
), then the conclusions drawn about θ
should be identical?
The Likelihood Principle.
What is a fundamental criterion mentioned for a prior distribution to ensure the posterior corresponds to a finite probability measure?
The posterior must be normalizable (i.e., its integral over the parameter space must be finite).
What distinguishes a proper prior from an improper prior?
A proper prior integrates to 1 over the parameter space Θ
, representing a valid probability distribution. An improper prior has an infinite integral (∫π₀(θ)dθ = ∞
) and is not a probability distribution.
When choosing a prior from a parametric family Pη = {π₀(θ; η) : η ∈ H}
, what two main steps are involved?
- Deciding on the family
Pη
. \n2. Setting the hyperparametersη
.
Why is the support of the prior family important?
The support must cover all plausible values of the parameter. The posterior support is a subset of the prior support, so a prior with too small support can incorrectly rule out parameter values.
In the multivariate case (θ ∈ R<sup>p</sup>
), what is a common simplification for choosing the prior π₀(θ)
?
Using a product prior: π₀(θ) = π₀(θ₁, ..., θp) = Π<sub>j=1</sub><sup>p</sup> π₀(θj)
, assuming prior independence (though posterior dependence can still exist).
What is the main advantage of using conjugate priors?
They lead to posteriors within the same distributional family, simplifying calculations (often just updating hyperparameters) and offering analytical tractability.
What is the suggestion attributed to Bayes for choosing a prior when there is no prior information about the parameter θ
?
Use a uniform prior, π₀(θ) = c
(constant), over the parameter space Θ
.
What problem can arise when using a uniform prior π₀(θ) = c
if the parameter space Θ
has infinite measure (e.g., Θ = R
)?
The prior becomes improper (∫π₀(θ)dθ = ∞
).
Under what condition can an improper prior still lead to a valid Bayesian inference?
If the resulting posterior distribution π(θ|x) ∝ L(θ, x)π₀(θ)
is proper (i.e., ∫π(θ|x)dθ < ∞
).
Are uniform priors invariant under reparameterization? Explain briefly.
No. If φ = g(θ)
is a non-linear transformation, a uniform prior on θ
(π₀(θ) = c
) leads to a non-uniform prior on φ
(π<sub>φ</sub>(φ) ∝ |dh(φ)/dφ|
, where h=g⁻¹
).
How is the Fisher information I(θ)
defined for a single parameter θ
?
I(θ) = E<sub>f(x|θ)</sub> [ (∂/∂θ log f(x|θ))² ]
What is the alternative expression for Fisher information I(θ)
under regularity conditions?
I(θ) = -E<sub>f(x|θ)</sub> [ ∂²/∂θ² log f(x|θ) ]
How is Jeffreys’ prior for a single parameter θ
defined?
π₀(θ) ∝ √I(θ)
What is the key property of Jeffreys’ prior regarding reparameterization?
It is invariant under one-to-one reparameterizations. If φ = g(θ)
, then π<sub>φ</sub>(φ) ∝ √I(φ)
.
How is Fisher information defined for multiple parameters θ = (θ₁, ..., θp)
?
As a matrix I(θ)
where the (i, j)-th element is (I(θ))<sub>i,j</sub> = E<sub>f(x|θ)</sub> [ -∂²/∂θi∂θj log f(x|θ) ]
.
How is Jeffreys’ prior defined for a multi-parameter θ
?
π₀(θ) ∝ √det(I(θ))
For n independent observations x = (x₁, ..., xn)
from f(x|θ)
, how does the Fisher information In(θ)
relate to the single-observation Fisher information I₁(θ)
?
In(θ) = n * I₁(θ)
Does Jeffreys’ prior satisfy the Likelihood Principle? Why or why not?
No, because Fisher Information I(θ)
is calculated as an expectation over the entire sample space (all possible data x
), not just based on the observed likelihood function L(θ|x)
.
What is the process of extracting expert prior knowledge to formulate a suitable prior distribution called?
Elicitation.
What is the most informative description of the parameter θ
in a Bayesian analysis?
The posterior distribution π(θ|x)
.
How is a Bayes estimator θ̂
defined in terms of a loss function L(θ, a)
?
θ̂ = arg min<sub>a</sub> E<sub>π(θ|x)</sub>[L(θ, a)] = arg min<sub>a</sub> ∫ L(θ, a) π(θ|x) dθ
What point estimate is the Bayes estimator with respect to the quadratic loss function L(θ, a) = (θ - a)²
?
The posterior mean, E<sub>π(θ|x)</sub>[θ]
.
What point estimate is the Bayes estimator with respect to the absolute error loss function L(θ, a) = |θ - a|
?
The posterior median, Med<sub>π(θ|x)</sub>[θ]
.