week 1 Flashcards
What is the primary goal of statistical inference as described in the notes?
To draw conclusions about the population, specifically its distribution F_X(x), based on sample data (X1, X2, …, Xn).
In frequentist inference, what are the two main types of models considered for the population distribution F_X(x)?
- Non-parametric: F_X(x) is completely unknown. 2. Parametric: The form F_X(x, θ) is known, but the parameter θ is unknown.
In parametric frequentist inference, what is the target of the inference?
The target is to draw conclusions about the unknown parameter θ using the sample data (X1, X2, …, Xn).
Define what it means for an infinite sequence of random variables X1, X2, … to be exchangeable.
An infinite sequence X1, X2, … is exchangeable if for any n ≥ 1, the joint distribution of (X1, …, Xn) is the same as the joint distribution of (X_τ(1), …, X_τ(n)) for any permutation τ of the indices (1, …, n).
Express the definition of exchangeability for a finite set X1, …, Xn using joint probabilities and sets A1, …, An.
For any n ≥ 1 and sets A1, …, An ⊆ R, P(∩{j=1}^n {Xj ∈ Aj}) = P(∩{j=1}^n {X_τ(j) ∈ Aj}) for all permutations τ of (1, …, n).
Express the definition of exchangeability for a finite set X1, …, Xn using cumulative distribution functions (CDFs).
For all (x1, …, xn) ∈ R^n, F_{X1,…,Xn}(x1, …, xn) = F_{X_τ(1),…,X_τ(n)}(x_τ(1), …, x_τ(n)).
What is the core statement of De Finetti’s representation theorem?
An infinite sequence X1, X2, … is exchangeable if and only if its joint distribution can be represented as a mixture of independent and identically distributed (i.i.d.) distributions, conditioned on some random variable T.
Provide the mathematical representation given by De Finetti’s theorem for the joint probability P(∩_{j=1}^n {Xj ∈ Aj}).
P(∩{j=1}^n {Xj ∈ Aj}) = ∫ [ Π{j=1}^n P(Xj ∈ Aj | T = t) ] dF_T(t), where T is a random variable with distribution F_T.
According to the notes, how can the random variable T in De Finetti’s theorem be interpreted?
T is a random variable formed as some function of (X1, …, Xn) in the limiting case as n → ∞.
What is the key consequence of De Finetti’s theorem mentioned regarding the joint marginal distribution of X1, …, Xn?
The joint marginal distribution of the observable quantities X1, …, Xn can be represented via a conditional/marginal decomposition (integrating out the latent variable T).
How does De Finetti’s theorem provide a theoretical justification for using Bayesian methods?
It shows that the assumption of exchangeability for observable random quantities naturally leads to a structure where the data are conditionally independent given some latent variable (parameter), which is central to Bayesian modeling.
Describe the construction process for exchangeable random variables suggested by De Finetti’s theorem.
- Sample a value ‘t’ for the latent variable T from its distribution f_T(t). 2. Sample X1, …, Xn independently from the conditional distribution f_{X|T}(x|t).
How does the frequentist approach view the parameter of interest θ?
θ is an unknown, but fixed, constant number.
How does the Bayesian approach view the parameter of interest θ?
θ is treated as a random variable, possessing a probability distribution (the prior distribution, π0(θ)) that reflects beliefs about θ before observing data.
How does the frequentist approach typically model the sample data (X1, …, Xn)?
As a set of independent and identically distributed (i.i.d.) random variables drawn from a distribution f_X(x; θ) parameterized by the fixed θ.
How does the Bayesian approach typically model the sample data (X1, …, Xn)?
As an exchangeable sequence of random variables. Conditionally on the random variable θ, the sample (X1, …, Xn) | θ are often assumed to be independent (and typically identically distributed) from a distribution f_{X|θ}(x|θ).
What is the relationship between exchangeability and the i.i.d. assumption in the Bayesian framework described?
Exchangeability is the weaker, more fundamental assumption about the observables. De Finetti’s theorem shows that exchangeability implies the data behave as if they were i.i.d. conditional on some latent quantity (which plays the role of the parameter θ).