Topic 2: Empirical Risk Minimisation Flashcards
What is F
The chosen family of models for a problem
How is F related to Ω
F is always a subset of Ω because we have always have a restricted choice of models
What is Ω
The space of all possible models
What is Empirical Risk minimisation generally
A mathematical framework to understand the theory of over-fitting
What is the input space
X = R^d
real-values in space with dimension d
What is the output space
y = R^k
real values in space with dimension k
What is a datapoint denotation
(x,y) = R^d x R^k
What is IID
“Independent and Identically distributed”
Each datapoint (x,y) is assumed to be an independent sample from the distribution D = P(x,y)
What is Sn
The overall dataset
Assumed to be a sample from a joint random variable P(x,y)^n (sampling n times, independently)
Loss function mathematical denotation
l:R^k x R^k -> [0, inf]
meaning it takes two labels: true label and model prediction and returns a loss value of 0 upwards (not including infinity)
What is always the first argument of the loss function
The true label
aka l(y, f(x))
What is empirical risk minimiser
ferm = arginf R^(f,Sn)
The model obtained if we could find the global minimum on a data sample Sn, when we are restricted to the model family F
(it is an estimated risk using an IID sample size n)
What is population risk
The true risk, all possible data we may never encounter
Consider n as infinity
Also known as the ‘generalisation error’ - expresses the error that a model f would make, on an average, over all possible inputs
The Population risk minimiser in F
f* = arginf R(f)
- indicates optimality
“Best in family” model
(now assuming we have infinite data)
It is a hypothetical model, the optimal model with no restrictions on the data
The population risk minimiser in Ω
y* = arginf R(f)
Also known as the Bayes model or bayes prediction (not to be confused with naive bayes)
This is a hypothetical model, the optimal model with no restriction on the model family and no restrictions on the data
Minimises the risk on all possible models (not restricted to a family)
What is the population risk minimiser for squared loss
For squared loss y* = Ey∣x[x]
(the expected value of y given x)
this is not computable as we do not have access to the full data distribution