week 4 Flashcards

1
Q

What principle states that if two experiments x and y yield proportional likelihood functions for θ (L(θ|x) ∝ L(θ|y)), then the conclusions drawn about θ should be identical?

A

The Likelihood Principle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a fundamental criterion mentioned for a prior distribution to ensure the posterior corresponds to a finite probability measure?

A

The posterior must be normalizable (i.e., its integral over the parameter space must be finite).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What distinguishes a proper prior from an improper prior?

A

A proper prior integrates to 1 over the parameter space Θ, representing a valid probability distribution. An improper prior has an infinite integral (∫π₀(θ)dθ = ∞) and is not a probability distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When choosing a prior from a parametric family Pη = {π₀(θ; η) : η ∈ H}, what two main steps are involved?

A
  1. Deciding on the family Pη. 2. Setting the hyperparameters η.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is the support of the prior family important?

A

The support must cover all plausible values of the parameter. The posterior support is a subset of the prior support, so a prior with too small support can incorrectly rule out parameter values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In the multivariate case (θ ∈ R^p), what is a common simplification for choosing the prior π₀(θ)?

A

Using a product prior: π₀(θ) = π₀(θ₁, …, θp) = Π_{j=1}^p π₀(θj), assuming prior independence (though posterior dependence can still exist).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the main advantage of using conjugate priors?

A

They lead to posteriors within the same distributional family, simplifying calculations (often just updating hyperparameters) and offering analytical tractability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the suggestion attributed to Bayes for choosing a prior when there is no prior information about the parameter θ?

A

Use a uniform prior, π₀(θ) = c (constant), over the parameter space Θ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What problem can arise when using a uniform prior π₀(θ) = c if the parameter space Θ has infinite measure (e.g., Θ = R)?

A

The prior becomes improper (∫π₀(θ)dθ = ∞).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Under what condition can an improper prior still lead to a valid Bayesian inference?

A

If the resulting posterior distribution π(θ|x) ∝ L(θ, x)π₀(θ) is proper (i.e., ∫π(θ|x)dθ < ∞).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Are uniform priors invariant under reparameterization? Explain briefly.

A

No. If φ = g(θ) is a non-linear transformation, a uniform prior on θ (π₀(θ) = c) leads to a non-uniform prior on φ (π<0xE1><0xB5><0xA3>(φ) ∝ |dh(φ)/dφ|, where h=g⁻¹).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the Fisher information I(θ) defined for a single parameter θ?

A

I(θ) = E_{f(x|θ)} [ (∂/∂θ log f(x|θ))² ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the alternative expression for Fisher information I(θ) under regularity conditions?

A

I(θ) = -E_{f(x|θ)} [ ∂²/∂θ² log f(x|θ) ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is Jeffreys’ prior for a single parameter θ defined?

A

π₀(θ) ∝ √I(θ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the key property of Jeffreys’ prior regarding reparameterization?

A

It is invariant under one-to-one reparameterizations. If φ = g(θ), then π<0xE1><0xB5><0xA3>(φ) ∝ √I(φ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is Fisher information defined for multiple parameters θ = (θ₁, …, θp)?

A

As a matrix I(θ) where the (i, j)-th element is (I(θ)){i,j} = E{f(x|θ)} [ -∂²/∂θi∂θj log f(x|θ) ].

17
Q

How is Jeffreys’ prior defined for a multi-parameter θ?

A

π₀(θ) ∝ √det(I(θ))

18
Q

For n independent observations x = (x₁, …, xn) from f(x|θ), how does the Fisher information In(θ) relate to the single-observation Fisher information I₁(θ)?

A

In(θ) = n * I₁(θ)

19
Q

Does Jeffreys’ prior satisfy the Likelihood Principle? Why or why not?

A

No, because Fisher Information I(θ) is calculated as an expectation over the entire sample space (all possible data x), not just based on the observed likelihood function L(θ|x).

20
Q

What is the process of extracting expert prior knowledge to formulate a suitable prior distribution called?

A

Elicitation.

21
Q

What is the most informative description of the parameter θ in a Bayesian analysis?

A

The posterior distribution π(θ|x).

22
Q

How is a Bayes estimator θ̂ defined in terms of a loss function L(θ, a)?

A

θ̂ = arg min_a E_{π(θ|x)}[L(θ, a)] = arg min_a ∫ L(θ, a) π(θ|x) dθ

23
Q

What point estimate is the Bayes estimator with respect to the quadratic loss function L(θ, a) = (θ - a)²?

A

The posterior mean, E_{π(θ|x)}[θ].

24
Q

What point estimate is the Bayes estimator with respect to the absolute error loss function L(θ, a) = |θ - a|?

A

The posterior median, Med_{π(θ|x)}[θ].

25
What point estimate is the Bayes estimator with respect to the zero-one loss function L(θ, a) = 1 if a ≠ θ, 0 if a = θ (especially for discrete θ)?
The posterior mode, arg max_θ π(θ|x).
26
What is another name for the posterior mode?
Maximum a posteriori (MAP) estimate.