Week 1 Flashcards
define likelihood
In machine learning, likelihood is a measure of how likely it is that a given model would have produced the observed data. It is a function of the model’s parameters and is used to evaluate the fit of the model to the data.
how do you calculate the likelihood?
It is calculated by taking the product of the probabilities of each individual data point, given the model and its parameters.
How to derive bias-variance decomposition (either derivation or high level explanation)
see image
Explain each individual term after bias-variance decomposition
-var[t] or sigma is the variance within the data, the greater the noise in the underlying data the worse the model is expected to perform
-var[f] represents the model’s sensitivity to changes (or variance) in the underlying data. If var[t] is large and we have an e.g. high order polynomial model then if we were to train the model on different folds as in cross-validation we may get many variations in the outputted model
-(h-E[t])^2 are the square residuals, it represents the model’s ability to represent the underlying data of the model accurately. It is referred to as the bias, high bias is shit generalisation of the model
In the context of bias-variance decomposition, what makes a successful model
Ideally, we want to minimise both var[f] and (h-E[t])^2. However, there is always a trade-off between the two.
Decreasing var[F] may often increase (h-E[t]^2) and vice versa so we want a good trade-off between the two when selecting a model
What does a bayesian interpretation of the likelihood of obtaining some parameter given the dataset show about the difficulty of naive regression
It assumes every parameter w is equally likely which lends itself well to overfitting. E.g. a parameter of w^10 = 97364 is equally likely as w^10=0.5.
How can we prevent overfitting using a prior distribution approach
Constrain the parameters to fall within a Gaussian distribution. After expansion, you will end up with L2 Normalisation
What are the normal equations for a regression problem with and without L2 regularisation
ΦTΦw − ΦTy = 0
ΦTt− (ΦTΦ+λI)w* = 0