Week 9: Multiple regression analysis Flashcards
What variables do we have when doing multiple linear regression?
More than one predictor variable
One response variable
What happens if the data is non-linear?
Trying to fit a straight line through curvy data produces a smaller fit to the data - leading to an underestimate of the relationship
How do we check for non-linearity of data?
Regression - correlation matrix and look at the plots
What things may make it hard to identify trends in the data?
Attenuated range or under dispersion of scores (clustered)
What is homoscedacity?
Means that the error variance should be the same at each level of the predictor variable
Heteroskedasticity tests?
You can ask for normality tests under assumptions
Tests the null of homoscedacity - if they are significant this means that it is violated
What are the assumptions of linear regression (6)?
Use correct variables (interval data) Independence of data Sample size Normality Linearity Homoscedacity
How is the linear regression line calculated?
By minimizing the sum of squared differences between observed and predicted values
How do you test for outliers (2 ways)?
- Basic approach (any residual >3 SDs away from the mean)
2. Cooks distance (a measure of the influence of one case on the model as a whole) - under assumption checks
What cooks value is concerning?
> 1 may be a concern
Multiple regression techniques are more sensitive to…
violations of these assumptions than single regression
Why should we use a correlation matrix to check the data?
Correlations are important, they tell us which IVs are related to the DV but also that some of the IVs are related to each other
What is it called when IVs are correlated with each other?
Colinearity
What does colinearity mean in terms of data?
Means that some of the predictors provide little unique information
How do you run a multiple regression in jamovi?
Regression - linear regression
How do you set up jamovi to run a simultaneous multiple regression?
Under model builder:
Put all IVs in a single box
What are the colinearity statistics?
Tolerance
Variance inflation factor (VIF)
What tolerance values are a problem?
<0.1 are a clear problem
What variance inflation factors are a problem?
The inverse of tolerance >10 are a problem
Simultaneous multiple regression is…
Theory free
What kind of multiple regression is driven by theory?
Hierarchical multiple regression
E.g. you want to know what a certain predictor can add to the prediction of an outcome variable beyond the amount that is already explained by a particular predictor
How do you set up jamovi for a hierarchical multiple regression?
Using model builder - put most important predictors in block one and then the subsequent predictors or those you want to know how much they add to the model in block 2
What does the output from a hierarchical multiple regression tell us?
Will create a change in R2 scores to tell us how much more variation in the outcome is explained by adding the subsequent predictor
As well as a model comparisons box that gives you a delta F equation telling you if the newer model is statistically significant or not
What type of linear regression would we use to determine the simplest possible model?
Stepwise multiple regression
How do we do a stepwise multiple regression analysis?
Look at the output from simultaneous multiple regression (order them according to statistical significance)
Forward stepwise: put the best predictors into the model first and then only entering more predictors if they improve the quality of the model by significantly increases R2
Backward stepwise removal: Start with all predictors in the model and remove the worst predictors until this has a negative impact on the quality of the predictive model - significantly reducing the R2
What are the two kinds of stepwise multiple regression?
Forward stepwise
Backwards stepwise
What do you need to be careful of with stepwise multiple regression?
It is data-driven rather than theory-driven so you need to make sure that the outcome still makes sense