LECTURE 3 AND 4 multiple regression Flashcards
what is multiple regression
using linear regression model to predict the value on one variable from several predictor variables - hypothetical relationship between several variables
what equation does the multiple regression use
expansion of straight line equation
y = b0X1 + b2X2 ….+error
forced entry
all predictors entered simultaneously
hierachial
experimenter decides input of variables based on theoretical background
aloows to observe unique predictive influence of a new variable on the outcome as known predictors held constant
stepwise
predictors selected using semi partial correlation with outcome
r value
correlation between observed value and predicted value
r square
proportion of variance accounted for by the model
adjusted r square
estimate of r square in popualtion (shrinkage) - estimates the change in r square when generalise from sample to the population
beta values
chnage in outcome associated with unit change in the predictor - also has SD value (as increase value by 1SD have a ___ SD effect on second variable)
t test in ANOVA
tells whether each IV make a significant contribution to predicting the DV (coeff sig diff from 0) (t= p=)
how to interpret SD beta values
when predictor value increase by 1SD, predicted value increase by ___ (beta) of a SD
r square change (hierachial)
hows how the second variable has accounted for more of the models variance
ie.e r square of model 1 is how much variance accounted for by 1st vairable
r square of model 2 is how much variance accounted for by both vairable - r square change is the specific variance accounted for by the addition of the second variable
how to report anova
F (df, residual df)=__, p= __)
how can you assess accuracy of the multiple regression model
standardised residuals and influential cases (cookes distance)
describe standardised residuals for accuracy of the multiple regression model
average sample - 95% STDresiduals between +- 2
99% between +- 2.5
outliers of >3
describe cookes distance for accuracy of the multiple regression model
measures influence of a single case on the whole model
concern = absolute values > 1
‘variable type’ regression assumption
outcome must be a continuous variables
predictors can be continuous or dichotomous
‘non zero variance’ regression assumption
predictors cannot have 0 variance
‘linearity’ regression assumption
relationship must be linear
‘independence’ regression assumption
all values come from different person
‘No Multicollinearity’ regression assumption
predictors must not be highly correlated
check with collinearity diagnostics - values should be:
TOLERANCE = > 0.2
VIF =
‘Homoscedasticity’ regression assumption
for each value of predictor - variance of error should be constant
check by plot of ZRESID against ZPRED to get normality of errors probability plot
‘independent errors’ regression assumption
for any observation, error terms should be uncorrelated
‘normality distributed errors’ regression assumption
histogram and normal probability plot of standardised residuals