week 2 - regression Flashcards

Question 1

Q

what are four reasons why overfitting might occur?

Answer

A

The model is too complex

There is not enough training data

The testing set is different to the training data

We have highly correlated variables in multidimensional datasets.

Question 2

Q

Explain the bias-variance trade off

Answer

A

Bias is how well the model fits to the training data (training error)

Variance is how much the predictability of our model varies across datasets

Generally, lower error = higher variance.
The goal of ML is to find the sweet spot

Question 3

Q

why is dimensionality a problem?

Answer

A

More data = more parameters. Estimating more parameters accurately is harder and takes more time

As dimensions increases, the volume of data space grows exponentially meaning data points become sparser relative to the available space. This means that it provides a poorer estimate of predictions as the distances between data points, which are important to identify patterns in the data, tend to shrink relative to the total volume space. More and more data points are needed in each dimension to provide a good estimate of predictions.

Variables will often be highly correlated. This makes it very difficult to estimate model parameters accurately.

Its hard to interpret models with vast numbers of predictors as it becomes harder to understand whats actually meaningful within the dataset

Question 4

Q

how many dimensions does it become completely impossible to estimate data points

Answer

A

If you have more dimensions than you have data points.

This is because, in any dimension of your data, you cannot fit the model to a single data point

Question 5

Q

what is regularisation in the context of the bias variance trade off?

Answer

A

Increasing bias to try to reduce variance

We weaken or remove some parameters in the model to simplify the model

This makes the model less accurate in the training dataset, but more generalisable to other datasets

Question 6

Q

when is regularisation helpful?

Answer

A

When we have correlated predictors in our dataset

When we have highly dimensional and noisy data

Question 7

Q

why is regularisation good for noisy data with many features?

Answer

A

Because in these datasets, some parameters will be overestimated because they are fitting to noise. These will thus have an outsized effect on predictions, meaning our model will overfit to the training data

Question 8

Q

What is ridge (L2) regression?

Answer

A

OLS regression which minimises the sum of least squares, plus a ‘penalty’ based on the parameter values

This penalty is esssentially an extra loss for the largest predictor coefficient (meaning the predictors that have the highest impact on the dependent variable.

Penalties are higher for models that have many high value parameters

This helps us avoid the problem of the model learning very strong predictive relationships that will not generalise to the testing data

Question 9

Q

what is the ridge regression cost function formula?

Answer

A

SSE + Parameters^2 x lambda

the penalty in the value of the parameters^2 x lambda

Parameters^2 means that it is always a positive penalty.

(parameters = slopes)

So you try and minimise the error in the OLS equation, but instead of error = SSE, you make error the SSE + Parameters^2 x lambda. Then if you plug that into the equation and recalculate the slope and intercept it’ll change the line slightly.

Question 10

Q

what is the OLS equation?

Answer

A

Dependent variable (Y) = y-intercept (B0) + Slope*Independent-variable (B1X) + Error (E)

Extra dimensions add extra slopesIV’s so a B2A + a B2*B may be added

Question 11

Q

How does the slope relate to the penalty in ridge regression?

Answer

A

When the slope is steep, relatively small changes in X correspond to large changes in y. This means the predictions are very sensitive to small changes in X

As we increase lambda, the slope of our model decreases. This means that as lambda increases, the predictions of our model get less sensitive to changes in X

If lambda is 0, the cost function is the same as ordinary least squares. If lambda is very high, all parameters will end up close to zero.

Question 12

Q

What is the cost function for logistic ridge regression?

Answer

A

Maximum likelyhoods. This is because the cost function for logistic regression is the sum of the maximum likelyhoods

Question 13

Q

How do we choose the value of lambda?

Answer

A

Cross validation

Question 14

Q

what is the cost function equation of lasso (L1) regularisation?

Answer

A

SSE + |slope| * lambda

This means it takes the absolute value of the slope (instead of squaring the slope like in ridge). This means it will shrink some coefficients to zero.

Question 15

Q

is lasso or ridge better?

Answer

A

Lasso is better at reducing variance when theres lots of noisy variables because it excludes useless or noisy variables by shrinking them to zero

Ridge regression performs better when most variables are useful

By excluding some variables, Lasso makes the regression easier to interpret

Question 16

Q

what is the equation for elastic net?

Answer

Study These Flashcards

A

SSE + |slope| *Lambda1 + slope^2 * lambda 2

Question 17

Q

Answer

Study These Flashcards

A

week 2 - regression Flashcards

(17 cards)