Week Nine - Multiple Linear Regression Flashcards
What are the 5 assumptions of linear regression?
CORRECT VARIABLES: interval
INDEPENDENCE OF DATA: each person should only participate once
SAMPLE SIZE/NORMALITY: larger SS the better
LINEARITY: can produce zero correlations if not linear
HOMOSCEDACITY OF RESIDUALS
What is homoscedacity of residuals?
The error variance should be the same at each level of the predictor
How can we test for normality?
QQ plots - should be a straight line
normality test under assumption checks on J
With residual plots (under assumption checks) what are we hoping the data to look like?
Rectangle with no pattern or outliers
When we have heteroscedacity imply? and what do the tests need to be?
That the model is only good for certain scores
p values need to be >.05
How do we test for outliers? (2)
Basic approach: any residual value > 3 SD from mean
Cooks distance: A measure of the influence of one case on the model as a whole
values > 1 may be a concern
What is colinearity?
When some of the IVs are closely related meaning that they provide little unique information
What does the model coefficients box tell us in simultaneous multiple regression?
They assess whether each regression coefficient is significantly different from zero in the context of other predictors - p>0.05 not useful
What is tolerance?
How much of the variability in the predictor variable is not explained by the other predictors (values <0.1 are a problem - most of the variation in the IV is explained by the others)
What VIF scores are a problem?
values >10
What is the underlying component of hierarchical regression?
Whether a predictor can add to the prediction of an outcome variable beyond the amount that is already explained by a particular predictor
What does the output of a HMR tell us?
it will create a change in R2 scores which will tell us how much the variation in the out come is explained by adding the extra predictor
Will also give delta f score explaining how much extra variation in %, and a p value
What does HMR examine? and what is it driven by
The incremental importance of a predictor variable
It is driven by hypotheses - researcher will specify
What do you use when you don’t have a hypothesis for your regression?
Use a step-wise approach
What is a forward stepwise entry approach?
Where we put the best predictors into the model first and then only entering more predictors if they improve the quality of the predictive model (ie increases R2 significantly)