Week 10 - Multiple regression Flashcards

Question 1

Q

Multiple regression

Answer

A

Linear regression model with multiple predictors

Question 2

Q

Interpretation of regression coefficients

Answer

A

The contribution to y of a given predictor, assuming the other predictors stay fixed

Question 3

Q

Influential data points

Answer

A

Some data points could have a large impact on the model fit
Typically undesirable (we want the model to “use” all of the data)

Two ways to identify such points: Leverage and Cook’s distance

Question 4

Q

Leverage

Answer

A

how far a observation is from the main cluster of values ONLY BASED ON THE X AXIS

The further away, the more likely it can “tilt” the regression model

Changing the y-value of a high-leverage point can noticeably shift the entire regression line

Question 5

Q

Cook’s distance

Answer

A

Measure the influence of a given data point

Uses the response variable (Y) and the predictors (X)

Combines information about both:
- leverage (how far an observation’s x-value is from the mean of all x-values)
- residuals (how far the actual y-value is from the predicted y-value)

If we were to remove a point, how would the line of best fit change?

Question 6

Q

Finding highly influential points

Answer

A

Points with leverage score or Cook’s D that exceed:

2p/n
- p is the number of predictor variables (includes intercept/constant)
- n is the number of observations

You need to get the score for each point and if higher than threshold, it can be considered a highly influential point
- “.hat” is the leverage score
- “.cooksd” is Cook’s D

Question 7

Q

Collinearity

Answer

A

Where some predictors are highly correlated with each other

Question 8

Q

Impacts of Collinearity

Answer

A

Can’t disentangle their influence
Uncertain estimates of regression coefficients
Model has to “choose” whether to favour one or the other
Numerical instability

Question 9

Q

Possible solutions to Collinearity

Answer

A

Remove some predictors (consider them redundant)
Ignore problem (mainly interested in prediction)

Question 10

Q

Variance inflation factor (VIF)

Answer

A

Used to assess the degree of collinearity among predictor variables

VIF values greater than 10 are considered high

Question 11

Q

Model performance measures

Answer

A

Determine which predictors will give us a good model

Log-likelihood (maximised) - Bad -> more predictors will increase likelihood

R^2 - Bad -> more predictors will increase R^2

Adjusted R^2 - Good -> adds a “penalty” for adding extra
predictors

AIC (smaller value the better) - Good -> A penalised likelihood measure

BIC (smaller value the better) - Good -> A penalised likelihood measure

Question 12

Q

AIC vs BIC

Answer

A

The AIC is usually better for choosing a good predictive model (we prefer AIC)

The BIC is usually better for choosing a “true” model

Week 10 - Multiple regression Flashcards

(12 cards)