Week 3 (regressions and control) Flashcards
what is regression to the mean
- if one sample of a random variable is extreme, the next sample of the same random variable is likely to be less extreme
- this is because extreme scores on a normal distribution are more likely to occur by chance than non extreme scores, therefore next time they are measured the scores are likely to be less extreme
- therefore it can be hard to see if an intervention is effective or if the results are just regression to the mean
- they should therefore be compared with control groups that dont recieve the intervention
what is a multiple linear regresion?
a linear regression with multiple predictor variables
what is the formula for a multiple linear regression
- the same as for linear regression except for every predictor you include, you have to add a coefficient
- therefore Y = the intercept + each predictor variable*their coefficient
-Y = B0 + B1 X1 + … + Bn Xn + e
Y = predicted value of the dependent variable
B0 = y-intercept or value of y with all parameters set to 0
B1 X1 = regression coefficient of the first independent variable
Bn Xn = regression coefficient of the last independent variable
E = model error or level of variation
what is SSt when there is multiple variables?
SSt represents the difference between the observed values and the mean value of the outcome variable
what is SSr when there is multiple variables?
SSr still represents the difference between the values of Y predicted by the model and the observed values
what is SSm when there is multiple variables?
SSm represents the difference between the values of Y predicted by the model and the mean value
what is multiple R^2?
the square of the correlation between the observed values of Y and the values of Y predicted by the multiple regression model
therefore large values of multiple R^2 represent a large correlation between the predicted and observed values of the outcome
a value of 1 would mean the model perfectly fit the data
do models with more or less variables tend to have a larger R^2
models with more variables always have a larger R^2
what is the Aikaike information criteria (AIC)?
- the problem with multiple R^2 is the more variables it has, the larger the R^2 value is
- AIC is a measure of fit which penalizes the model for having more variables
- A larger AIC value indicates a worse fit, corrected for the number of variables
- it only makes sense to compare AIC between models of the same data, as its relative
what is hierarchical regression?
- predictors are decided based on past work and the experimenter decides which order to enter predictors into the model
- they are entered in order or importance in predicting the outcome
what is forced entry regression?
- all predictors are entered into the model simultaneously
- some believe this is the only appropriate technique for theory testing
what is stepwise regression?
- decisions about the order in which predictors are entered into the model are based on a purely mathematic criterion
how does R carry out a forward stepwise regression?
- an initial model is defined that contains only the constant
- the computer then picks the variable that has the biggest simple correlation with the outcome and calculates how much variation that this variable explains
- the model the searches for a second predictor that can explain the biggest part of the remaining variance
- this gives a measure of how much ‘new variance in the outcome can be explained by each remaining predictor
- the model always picked the next variable that can explain the largest amount of variable
- R has to decide when to stop adding predictors to the model and it does this based on the AIC criterion
- Variables are only kept in the model if it lowers the AIC and if no variable can lower the AIC further the model is stopped
-
how is a backward stepwise regression different from a forward stepwise regression or a both regression
- the forward model is where predictor variables are added until none can lower the AIC any further
- the backward model is where the computer begins by placing all predictors in the model and then removes them by looking to see if the AIC goes down when each variable is removed, this continues until removing any variables causes AIC to increase
- in the both model this goes in both directions. each time a predictor is added to the model a removal test is made of the least useful predictor
why is the backward stepwise regression preferable to the forward stepwise regression
- because of suppressor effects
- this occurs when a predictor has an effect but only if another variable is held constant
- forward selection is more likely than backward selection to exclude predictors involved in backwards effects