Week 10 - Multiple regression Flashcards

1
Q

Multiple regression

A

Linear regression model with multiple predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Interpretation of regression coefficients

A

The contribution to y of a given predictor, assuming the other predictors stay fixed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Influential data points

A

Some data points could have a large impact on the model fit
Typically undesirable (we want the model to “use” all of the data)

Two ways to identify such points: Leverage and Cook’s distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Leverage

A

how far a observation is from the main cluster of values ONLY BASED ON THE X AXIS

The further away, the more likely it can “tilt” the regression model

Changing the y-value of a high-leverage point can noticeably shift the entire regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cook’s distance

A

Measure the influence of a given data point

Uses the response variable (Y) and the predictors (X)

Combines information about both:
- leverage (how far an observation’s x-value is from the mean of all x-values)
- residuals (how far the actual y-value is from the predicted y-value)

If we were to remove a point, how would the line of best fit change?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Finding highly influential points

A

Points with leverage score or Cook’s D that exceed:

2p/n
- p is the number of predictor variables (includes intercept/constant)
- n is the number of observations

You need to get the score for each point and if higher than threshold, it can be considered a highly influential point
- “.hat” is the leverage score
- “.cooksd” is Cook’s D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Collinearity

A

Where some predictors are highly correlated with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Impacts of Collinearity

A
  • Can’t disentangle their influence
  • Uncertain estimates of regression coefficients
  • Model has to “choose” whether to favour one or the other
  • Numerical instability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Possible solutions to Collinearity

A
  • Remove some predictors (consider them redundant)
  • Ignore problem (mainly interested in prediction)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Variance inflation factor (VIF)

A

Used to assess the degree of collinearity among predictor variables

VIF values greater than 10 are considered high

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Model performance measures

A

Determine which predictors will give us a good model

Log-likelihood (maximised) - Bad -> more predictors will increase likelihood

R^2 - Bad -> more predictors will increase R^2

Adjusted R^2 - Good -> adds a “penalty” for adding extra
predictors

AIC (smaller value the better) - Good -> A penalised likelihood measure

BIC (smaller value the better) - Good -> A penalised likelihood measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

AIC vs BIC

A

The AIC is usually better for choosing a good predictive model (we prefer AIC)

The BIC is usually better for choosing a “true” model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly