Topic 4: The Bias-Variance Decomposition Flashcards
What is the variance of a model a measure of
Sensitivity to the training data
What is the bias of a model
how far the cluster of predictions are from the target
roughly translates to a measure of strength in the predictor
High bias -> centred around a point that is not the bullseye target)
What is a joint random variable
A variable taken from P(x,y)^n
Joint = two variables x and y
n = n observations taken of (x,y)
What is the expected squared risk
ESn[R(f)] = ESn[ E(x,y) [(f(x) − y)2] ]
What is ESn
Average of all possible training datasets
What is E(x,y) ~ D
The average over all possible testing points
The random variable (x,y) follows a certain probability distribution D
What is the Bias Variance decomposition for squared risk
ESn[R(f)] = Ex (noise + bias + variance)
What is the noise term
Ey∣x[ (y − Ey∣x[y])^2]
An irreducible constant, independent of any model parameters
Caused by choice of data/features and not by the model
What is the bias term
(ESn [f(x)] − Ey∣x[y])^2
This is the loss of the expected model against Ey|x[y]
The expected model (ESn [f(x)]) is the average response we would get if we could average over all possible training data sets
What is the variance term
ESn [ ( f(x) − ESn [f(x)] )2 ]
Compares a standard prediction f(x) with the average ESn [f(x)] and then takes the squared average
Captures variation in f due to different training sets, varying around the expected model
Model too flexible -> will grow large
How do you reduce the bias
Increase the flexibility of the model
So increase the model family size
Potentially can be reduced by adding more features
How do you reduce the noise
Can only reduce it by getting better quality labelled data (not by increasing data size)
It is equal to R(y*) = Bayes risk
How do you reduce the variance
(Potentially)
Increasing the number of training examples
Adding some regularization to the model
Bagging algorithm
What other losses does the bias variance decomposition hold for
squared loss
cross entropy loss
What is the relationship between bias-variance decomposition and approximation-estimation decomposition
They are not equal but strongly related
Noise is equal to bayes risk
What is the most common loss function used to train neural networks
Cross entropy
What does Ey∣x [y] mean
The average value of y, given that x is assigned the value x