6) Optimality properties and conclusion Flashcards
What is a summary statistic T(D) and how can it vary in its representation
A function of the observed data {D} = {x 1,β¦,x n }
designed to describe key characteristics of the data. It can take various forms, including scalar values, vectors, matrices ect
Summary statistics typically focus on important data properties like:
* Location: For example, the mean (Λx ) or the median
* Scale: Such as the standard deviation or the interquartile range.
When is a statistic sufficient
- If the corresponding likelihood function can be written using only π(π·) in the terms that involve π½ such that
- πΏ(π½|π·) = β(π(π·), π½) π(π·) ,
where β() and π() are positive-valued functions, and or equivalently on log-scale - ππ(π½) = log β(π(π·), π½) + log π(π·)
How does the existence and uniqueness of the MLE
relate to sufficient statistics
If the MLE exists and is unique, then ΞΈ^ ML is a unique function of the sufficient statistic
How does a sufficient statistic partition the space of data sets
A sufficient statistic effectively partitions the space of all possible data sets into clusters, where each cluster contains data sets that result in the same value of
T(D). This partitioning is represented by:
The data sets in π³π‘ are equivalent in terms of the sufficient statistic
What does it mean for two data sets to be likelihood equivalent
Two data sets π·1 and π·2 for which the ratio of the corresponding likelihoods πΏ(π½|π·1)/πΏ(π½|π·2) does not depend on π½
Is π³π‘ liklihood equivalent
Yes all data sets in π³π‘ are likelihood equivalent
What defines a minimal sufficient statistic
A minimal sufficient statistic is defined as a sufficient statistic for which all likelihood equivalent data sets are also equivalent under this statistic
What is a trivial example of a minimal sufficient statistic
The likelihood function itself since by definition it can be computed from any set of sufficient statistics
What are the differences between forward KL and reverse KL divergence minimisation
- Forward KL Divergence, minπ½ π·KL(πΉ0, πΉπ½)
zero avoiding property: ππ½(π₯) > 0 whenever π0(π₯) > 0 - Reverse KL Divergence, minπ½ π·KL(πΉπ½, πΉ0)
zero forcing property: ππ½(π₯) = 0 whenever π0(π₯) = 0
How does a small sample size n affect the reliability of MLE and what are alternative strategies
MLE can overfit, meaning it too closely matches the particularities of the small dataset, leading to poor generalization to the broader population.
Alternative methods -
* Regularised/penalised likelihood
* Bayesian methods