week 2 Flashcards
State Bayes’ Theorem for random variables Y and X in terms of their conditional and marginal densities.
f_{Y|X}(y|x) = [f_{X|Y}(x|y) * f_Y(y)] / f_X(x), where f_X(x) = ∫ f_{X|Y}(x|y) * f_Y(y) dy.
In Bayesian analysis, how is the parameter θ treated, and what represents the initial beliefs about it?
θ is treated as a random variable with a prior density π₀(θ) encapsulating beliefs about θ before observing data.
Write down the formula for the posterior distribution π(θ|x) using Bayes’ Theorem, given data x = (x1, …, xn).
π(θ|x) = [Π_{i=1}^n f_{X|θ}(xi|θ) * π₀(θ)] / f(x) = [L(θ, x) * π₀(θ)] / ∫ L(θ, x) * π₀(θ) dθ
What is the likelihood function, L(θ, x), in the context of Bayesian inference?
L(θ, x) = Π_{i=1}^n f_{X|θ}(xi|θ), representing the probability (or density) of observing the data x given a specific value of the parameter θ.
What is the term for the denominator in the Bayes’ Theorem formula for π(θ|x), and what does it represent?
The denominator, f(x) = ∫ L(θ, x) π₀(θ) dθ, is called the marginal likelihood or evidence. It represents the marginal probability (density) of observing the data x, integrated over all possible values of θ.
What is the proportionality relationship used for calculating the posterior distribution, ignoring the normalizing constant?
π(θ|x) ∝ L(θ, x) * π₀(θ)
Describe how Bayesian updating works sequentially when a new datum x2 arrives after observing x1.
The posterior after x1, π(θ|x1), becomes the prior for processing x2. The new posterior is π(θ|x1, x2) ∝ f_{X|θ}(x2|θ) * π(θ|x1).
If T = T(X) is a sufficient statistic for θ, how does this simplify the calculation of the posterior distribution π(θ|x)?
The posterior distribution depends on the data x only through the value of the sufficient statistic T(x). That is, π(θ|x) ∝ g(T(x), θ) * π₀(θ), where L(θ, x) = g(T(x), θ)h(x) by the Factorization Theorem.
What are the two main computational/analytical challenges mentioned in Bayesian inference related to the posterior and marginal likelihood?
- Evaluating the marginal likelihood integral f(x) = ∫ L(θ, x) π₀(θ) dθ. 2. Determining the distributional form of the posterior π(θ|x).
What is a conjugate prior family P for a class of likelihood distributions F = {f_{X|θ}(x|θ)}?
P is conjugate for F if, for any prior π₀(θ) ∈ P and any likelihood f_{X|θ}(x|θ) ∈ F, the resulting posterior distribution π(θ|x) is also in the family P.
What is the main advantage of using a conjugate prior?
It leads to an analytically tractable posterior calculation, meaning the form of the posterior distribution is known and often easy to compute.
Write the general form of a k-parameter exponential family pdf/pmf, f_{X|θ}(x|θ).
f_{X|θ}(x|θ) = h(x) * c(θ) * exp[ Σ_{j=1}^k t_j(x) * w_j(θ) ]
What are the components h(x), c(θ), t_j(x), and w_j(θ) in the exponential family definition?
h(x) is a function of x only; c(θ) is a function of θ only (related to the normalizing constant); t_j(x) are the sufficient statistics; w_j(θ) are functions of the parameters (often called natural parameters).
When is an exponential family called ‘regular’?
The family is regular if the support of the distribution, denoted by the set X, does not depend on the parameter θ.
What is the form of the conjugate prior π₀(θ) for a parameter θ of a regular k-parameter exponential family likelihood?
π₀(θ) = d(α, β) * [c(θ)]^α * exp[ Σ_{j=1}^k β_j * w_j(θ) ], where α and β = (β1, …, βk) are hyperparameters and d(α, β) is the prior normalizing constant.
Given a sample x = (x1, …, xn) from a regular exponential family and a conjugate prior as defined above, what is the form of the posterior distribution π(θ|x)?
The posterior is proportional to [c(θ)]^(α+n) * exp[ Σ_{j=1}^k (β_j + Σ_{i=1}^n t_j(xi)) * w_j(θ) ]. It has the same form as the prior but with updated hyperparameters.
How are the hyperparameters (α, β) updated to get the posterior hyperparameters (α, β) for the conjugate prior of a regular exponential family after observing data x = (x1, …, xn)?
α* = α + n; β_j* = β_j + Σ_{i=1}^n t_j(xi) for j = 1, …, k.