In the multivariate case (`θ ∈ Rp`), what is a common simplification for choosing the prior `π₀(θ)`?

Using a product prior: `π₀(θ) = π₀(θ₁, ..., θp) = Πj=1p π₀(θj)`, assuming prior independence (though posterior dependence can still exist).

week 4 formatted Flashcards by tyrion lannister

What principle states that if two experiments x and y yield proportional likelihood functions for θ (L(θ|x) ∝ L(θ|y)), then the conclusions drawn about θ should be identical?

The Likelihood Principle.

How well did you know this?

Not at all

Perfectly

What is a fundamental criterion mentioned for a prior distribution to ensure the posterior corresponds to a finite probability measure?

The posterior must be normalizable (i.e., its integral over the parameter space must be finite).

How well did you know this?

Not at all

Perfectly

What distinguishes a proper prior from an improper prior?

A proper prior integrates to 1 over the parameter space Θ, representing a valid probability distribution. An improper prior has an infinite integral (∫π₀(θ)dθ = ∞) and is not a probability distribution.

How well did you know this?

Not at all

Perfectly

When choosing a prior from a parametric family Pη = {π₀(θ; η) : η ∈ H}, what two main steps are involved?

Deciding on the family Pη. \n2. Setting the hyperparameters η.

How well did you know this?

Not at all

Perfectly

Why is the support of the prior family important?

The support must cover all plausible values of the parameter. The posterior support is a subset of the prior support, so a prior with too small support can incorrectly rule out parameter values.

How well did you know this?

Not at all

Perfectly

In the multivariate case (θ ∈ Rp), what is a common simplification for choosing the prior π₀(θ)?

Using a product prior: π₀(θ) = π₀(θ₁, ..., θp) = Πj=1p π₀(θj), assuming prior independence (though posterior dependence can still exist).

How well did you know this?

Not at all

Perfectly

What is the main advantage of using conjugate priors?

They lead to posteriors within the same distributional family, simplifying calculations (often just updating hyperparameters) and offering analytical tractability.

How well did you know this?

Not at all

Perfectly

What is the suggestion attributed to Bayes for choosing a prior when there is no prior information about the parameter θ?

Use a uniform prior, π₀(θ) = c (constant), over the parameter space Θ.

How well did you know this?

Not at all

Perfectly

What problem can arise when using a uniform prior π₀(θ) = c if the parameter space Θ has infinite measure (e.g., Θ = R)?

The prior becomes improper (∫π₀(θ)dθ = ∞).

How well did you know this?

Not at all

Perfectly

Under what condition can an improper prior still lead to a valid Bayesian inference?

If the resulting posterior distribution π(θ|x) ∝ L(θ, x)π₀(θ) is proper (i.e., ∫π(θ|x)dθ < ∞).

How well did you know this?

Not at all

Perfectly

Are uniform priors invariant under reparameterization? Explain briefly.

No. If φ = g(θ) is a non-linear transformation, a uniform prior on θ (π₀(θ) = c) leads to a non-uniform prior on φ (πφ(φ) ∝ |dh(φ)/dφ|, where h=g⁻¹).

How well did you know this?

Not at all

Perfectly

How is the Fisher information I(θ) defined for a single parameter θ?

I(θ) = Ef(x|θ) [ (∂/∂θ log f(x|θ))² ]

How well did you know this?

Not at all

Perfectly

What is the alternative expression for Fisher information I(θ) under regularity conditions?

I(θ) = -Ef(x|θ) [ ∂²/∂θ² log f(x|θ) ]

How well did you know this?

Not at all

Perfectly

How is Jeffreys’ prior for a single parameter θ defined?

π₀(θ) ∝ √I(θ)

How well did you know this?

Not at all

Perfectly

What is the key property of Jeffreys’ prior regarding reparameterization?

It is invariant under one-to-one reparameterizations. If φ = g(θ), then πφ(φ) ∝ √I(φ).

How well did you know this?

Not at all

Perfectly

How is Fisher information defined for multiple parameters θ = (θ₁, ..., θp)?

As a matrix I(θ) where the (i, j)-th element is (I(θ))i,j = Ef(x|θ) [ -∂²/∂θi∂θj log f(x|θ) ].

How is Jeffreys’ prior defined for a multi-parameter θ?

π₀(θ) ∝ √det(I(θ))

For n independent observations x = (x₁, ..., xn) from f(x|θ), how does the Fisher information In(θ) relate to the single-observation Fisher information I₁(θ)?

In(θ) = n * I₁(θ)

Does Jeffreys’ prior satisfy the Likelihood Principle? Why or why not?

No, because Fisher Information I(θ) is calculated as an expectation over the entire sample space (all possible data x), not just based on the observed likelihood function L(θ|x).

What is the process of extracting expert prior knowledge to formulate a suitable prior distribution called?

Elicitation.

What is the most informative description of the parameter θ in a Bayesian analysis?

The posterior distribution π(θ|x).

How is a Bayes estimator θ̂ defined in terms of a loss function L(θ, a)?

θ̂ = arg mina Eπ(θ|x)[L(θ, a)] = arg mina ∫ L(θ, a) π(θ|x) dθ

What point estimate is the Bayes estimator with respect to the quadratic loss function L(θ, a) = (θ - a)²?

The posterior mean, Eπ(θ|x)[θ].

What point estimate is the Bayes estimator with respect to the absolute error loss function L(θ, a) = |θ - a|?

The posterior median, Medπ(θ|x)[θ].

What point estimate is the Bayes estimator with respect to the **zero-one loss** function `L(θ, a) = 1` if `a ≠ θ`, 0 if `a = θ` (especially for discrete `θ`)?

The **posterior mode**, `arg max_θ π(θ|x)`.

What is another name for the posterior mode?

**Maximum a posteriori (MAP)** estimate.