Multiple Linear Regression Flashcards
What is multiple regression?
Uses multiple predictors to predict an outcome variable. It accounts for interrelationships between predictors and estimates the relationship between a set of predictors and an outcome variable
What questions do we try to answer using multiple regression?
●How well does the combination of the predictor variables predict the outcome variable?
●What is the contribution of each of the predictor variables to the model?
What information does regression analysis provide?
●Variance explained by the model (R and R squared) - Model Summary table
●How well the model represents the data compared to using the mean (P-value) - ANOVA table
●How well the predictor is working within the model (B0 and B1) - Coefficients table
What are the data type requirements for multiple regression variables?
●Independent Variables: Continuous or dichotomous; more than one IV.
●Dependent Variable: Continuous. If dichotomous, use Chi-square or logistic models.
What are the different methods of constructing regression models?
Forced, stepwise, and hierarchical. These methods differ in how they enter predictors into the model.
Explain the forced entry method in multiple regression.
All predictors are entered together, examining only unique relationships between predictors and the outcome (excluding overlap between predictors).
Which coefficients are used in the regression table and for reporting and comparing predictors?
Unstandardised coefficients are used in the regression table. Standardised coefficients are used when reporting and comparing predictors.
Define the terms R and R-squared in multiple regression.
●R: Multiple correlation coefficient
●R-squared: Coefficient of multiple determination
List the assumptions of multiple regression
●Variable type: Outcome variable must be continuous (interval); predictors should be continuous (interval) but can be nominal with two levels.
●Non-zero variance: Variables must have some variability.
●Sufficient power: Enough participants to provide sufficient data (40 + k(10) where K is the number of predictors).
●Linear Relationship: A linear relationship between variables should be visually assessed using a scatter plot.
●Normally distributed residuals (errors): Residuals should be random and normally distributed with a mean of zero.
●Homoscedasticity: Error variance should be roughly the same across the predictor variable.
●Independence of errors: All outcome values should come from different individuals (no autocorrelation). The Durbin-Watson statistic should be between 1.5 and 2.5.
●Multicollinearity: The relationship/overlap between IVs should not be excessive.
How do you check for multicollinearity?
●Correlate all IVs; a correlation of .80 or above indicates an issue.
●Check collinearity statistics: Tolerance should be greater than .2, and VIF should be lower than 10.
Explain the difference between standardised and unstandardised beta values.
●Unstandardised: In the original units of measurement, used for making predictions using the regression equation. For each one-unit increase in X, Y will increase/decrease by the unstandardised beta value.
●Standardised: Can be compared across predictor variables. For one SD increase in X, Y will increase/decrease by the standardised beta value.
When choosing predictors for multiple regression, what is the aim and what factors should be considered?
●Aim: To identify the most important variables for the model.
●Considerations: Theoretical background and existing literature should guide predictor selection.