lecture 7 - regression assumptions and quality checks Flashcards

1
Q

what 3 things does regression capture?

A
  1. associations
  2. correlations
  3. effects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what 3 things does multiple regression allow us to do?

A
  1. bring together lots of different variables and assess their importance simultaneously
  2. isolates the independent effects of each variable, controlling for the others
  3. make predictions based on effects seen
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

which variable isn’t in the multiple regression equation?

A

the reference category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is y hat?

A

the predicted y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does leverage mean?

A

some cases contribute more to the estimation of the effects (extreme value points) - leverage should not be present in a regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

key assumptions

A
  1. linearity
  2. multicollinearity - independence of x variables
  3. leverage and outliers (extreme value points)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

best way to check for linearity?

A

partial plot - plot the y residuals against the x residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

if there isn’t linearity what can you do?

A

transform the data e.g. logging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

when is multicollinearity considered a problem?

A

when the correlation is more than +/- 0.8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why is multicollinearity a problem?

A

causes misleading and unstable results

- if Xs are entangled, where do we place the effect?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what do outliers and leverage points do to the b coefficient?

A

cause it to shift dramatically based on few or one cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is an outlier?

A

an extreme data point - badly predicted y value with lots of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how to check for outliers

A

look at standardised residuals - greater than +/- 3 = 99%, greater than +/- 2 = 95%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the equation for checking leverage points?

A

(3(k+1)/n)
k = number of independent variables
n = sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

2 ways to check for leverage points?

A

equation

cooks distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly