lecture 7 - regression assumptions and quality checks Flashcards
what 3 things does regression capture?
- associations
- correlations
- effects
what 3 things does multiple regression allow us to do?
- bring together lots of different variables and assess their importance simultaneously
- isolates the independent effects of each variable, controlling for the others
- make predictions based on effects seen
which variable isn’t in the multiple regression equation?
the reference category
what is y hat?
the predicted y
what does leverage mean?
some cases contribute more to the estimation of the effects (extreme value points) - leverage should not be present in a regression model
key assumptions
- linearity
- multicollinearity - independence of x variables
- leverage and outliers (extreme value points)
best way to check for linearity?
partial plot - plot the y residuals against the x residuals
if there isn’t linearity what can you do?
transform the data e.g. logging
when is multicollinearity considered a problem?
when the correlation is more than +/- 0.8
why is multicollinearity a problem?
causes misleading and unstable results
- if Xs are entangled, where do we place the effect?
what do outliers and leverage points do to the b coefficient?
cause it to shift dramatically based on few or one cases
what is an outlier?
an extreme data point - badly predicted y value with lots of error
how to check for outliers
look at standardised residuals - greater than +/- 3 = 99%, greater than +/- 2 = 95%
what is the equation for checking leverage points?
(3(k+1)/n)
k = number of independent variables
n = sample size
2 ways to check for leverage points?
equation
cooks distance