Week 1 Flashcards

1
Q

define likelihood

A

In machine learning, likelihood is a measure of how likely it is that a given model would have produced the observed data. It is a function of the model’s parameters and is used to evaluate the fit of the model to the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how do you calculate the likelihood?

A

It is calculated by taking the product of the probabilities of each individual data point, given the model and its parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to derive bias-variance decomposition (either derivation or high level explanation)

A

see image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain each individual term after bias-variance decomposition

A

-var[t] or sigma is the variance within the data, the greater the noise in the underlying data the worse the model is expected to perform

-var[f] represents the model’s sensitivity to changes (or variance) in the underlying data. If var[t] is large and we have an e.g. high order polynomial model then if we were to train the model on different folds as in cross-validation we may get many variations in the outputted model

-(h-E[t])^2 are the square residuals, it represents the model’s ability to represent the underlying data of the model accurately. It is referred to as the bias, high bias is shit generalisation of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In the context of bias-variance decomposition, what makes a successful model

A

Ideally, we want to minimise both var[f] and (h-E[t])^2. However, there is always a trade-off between the two.

Decreasing var[F] may often increase (h-E[t]^2) and vice versa so we want a good trade-off between the two when selecting a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a bayesian interpretation of the likelihood of obtaining some parameter given the dataset show about the difficulty of naive regression

A

It assumes every parameter w is equally likely which lends itself well to overfitting. E.g. a parameter of w^10 = 97364 is equally likely as w^10=0.5.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can we prevent overfitting using a prior distribution approach

A

Constrain the parameters to fall within a Gaussian distribution. After expansion, you will end up with L2 Normalisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the normal equations for a regression problem with and without L2 regularisation

A

ΦTΦw − ΦTy = 0
ΦTt− (ΦTΦ+λI)w* = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly