week 2 Flashcards

1
Q

State Bayes’ Theorem for random variables Y and X in terms of their conditional and marginal densities.

A

f_{Y|X}(y|x) = [f_{X|Y}(x|y) * f_Y(y)] / f_X(x), where f_X(x) = ∫ f_{X|Y}(x|y) * f_Y(y) dy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In Bayesian analysis, how is the parameter θ treated, and what represents the initial beliefs about it?

A

θ is treated as a random variable with a prior density π₀(θ) encapsulating beliefs about θ before observing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Write down the formula for the posterior distribution π(θ|x) using Bayes’ Theorem, given data x = (x1, …, xn).

A

π(θ|x) = [Π_{i=1}^n f_{X|θ}(xi|θ) * π₀(θ)] / f(x) = [L(θ, x) * π₀(θ)] / ∫ L(θ, x) * π₀(θ) dθ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the likelihood function, L(θ, x), in the context of Bayesian inference?

A

L(θ, x) = Π_{i=1}^n f_{X|θ}(xi|θ), representing the probability (or density) of observing the data x given a specific value of the parameter θ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the term for the denominator in the Bayes’ Theorem formula for π(θ|x), and what does it represent?

A

The denominator, f(x) = ∫ L(θ, x) π₀(θ) dθ, is called the marginal likelihood or evidence. It represents the marginal probability (density) of observing the data x, integrated over all possible values of θ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the proportionality relationship used for calculating the posterior distribution, ignoring the normalizing constant?

A

π(θ|x) ∝ L(θ, x) * π₀(θ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe how Bayesian updating works sequentially when a new datum x2 arrives after observing x1.

A

The posterior after x1, π(θ|x1), becomes the prior for processing x2. The new posterior is π(θ|x1, x2) ∝ f_{X|θ}(x2|θ) * π(θ|x1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If T = T(X) is a sufficient statistic for θ, how does this simplify the calculation of the posterior distribution π(θ|x)?

A

The posterior distribution depends on the data x only through the value of the sufficient statistic T(x). That is, π(θ|x) ∝ g(T(x), θ) * π₀(θ), where L(θ, x) = g(T(x), θ)h(x) by the Factorization Theorem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two main computational/analytical challenges mentioned in Bayesian inference related to the posterior and marginal likelihood?

A
  1. Evaluating the marginal likelihood integral f(x) = ∫ L(θ, x) π₀(θ) dθ. 2. Determining the distributional form of the posterior π(θ|x).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a conjugate prior family P for a class of likelihood distributions F = {f_{X|θ}(x|θ)}?

A

P is conjugate for F if, for any prior π₀(θ) ∈ P and any likelihood f_{X|θ}(x|θ) ∈ F, the resulting posterior distribution π(θ|x) is also in the family P.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main advantage of using a conjugate prior?

A

It leads to an analytically tractable posterior calculation, meaning the form of the posterior distribution is known and often easy to compute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Write the general form of a k-parameter exponential family pdf/pmf, f_{X|θ}(x|θ).

A

f_{X|θ}(x|θ) = h(x) * c(θ) * exp[ Σ_{j=1}^k t_j(x) * w_j(θ) ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the components h(x), c(θ), t_j(x), and w_j(θ) in the exponential family definition?

A

h(x) is a function of x only; c(θ) is a function of θ only (related to the normalizing constant); t_j(x) are the sufficient statistics; w_j(θ) are functions of the parameters (often called natural parameters).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When is an exponential family called ‘regular’?

A

The family is regular if the support of the distribution, denoted by the set X, does not depend on the parameter θ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the form of the conjugate prior π₀(θ) for a parameter θ of a regular k-parameter exponential family likelihood?

A

π₀(θ) = d(α, β) * [c(θ)]^α * exp[ Σ_{j=1}^k β_j * w_j(θ) ], where α and β = (β1, …, βk) are hyperparameters and d(α, β) is the prior normalizing constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Given a sample x = (x1, …, xn) from a regular exponential family and a conjugate prior as defined above, what is the form of the posterior distribution π(θ|x)?

A

The posterior is proportional to [c(θ)]^(α+n) * exp[ Σ_{j=1}^k (β_j + Σ_{i=1}^n t_j(xi)) * w_j(θ) ]. It has the same form as the prior but with updated hyperparameters.

17
Q

How are the hyperparameters (α, β) updated to get the posterior hyperparameters (α, β) for the conjugate prior of a regular exponential family after observing data x = (x1, …, xn)?

A

α* = α + n; β_j* = β_j + Σ_{i=1}^n t_j(xi) for j = 1, …, k.