Multiple Linear Regression Flashcards
How does multiple linear regression differ from simple linear regression?
▪️Includes more predictors/IVs
▪️See if they fit a regression plane (instead of a line)
y = β0 + β1x1 + β2x2
When do you use multiple linear regression?
To study relationship between one DV and two or more IVs simultaneously
What is the partial regression coefficient (βi)?
The amount Y will change for each uni increase in IV x1 whilst holding all other variables constant
What is the null hypothesis for a multiple linear regression?
Holding all other variables constant, there is not linear association between y and x1
What is a confounding variable?
Any variable that may distort the observed association between and explanatory variable and outcome
Has an effect on both variables
What happens if you don’t take confounder into account?
Introduce bias in the estimation of β1
How many confounder can you consider in a multiple regression model?
Usually one IV for each 10 observations
E.g. if n=100, can consider 10 IVs, 9 confounder
What is R-squared?
The coefficient of determination
Measure of how well regression line/hyperplane approximates real data points (goodness of fit)
What does an R-squared of 0 indicate?
Poor fit - regression line would be perfectly horizontal
What does an R-squared of 1 indicate?
Perfect fit
What is R-squared in a simple linear regression?
Pearson’s coefficient squared (r-squared)
How do you interpret R-squared?
The proportion of variance in the DV that is “explained” by the IVs in the model
How is R-squared interpreted in the context of prediction analysis?
How well the model will be able to predict values of Y based in observed values of IVs
What does R-squared NOT indicate?
▪️The IVs CAUSE changes in the DV
▪️Correct type of regression was used
▪️Most appropriate IVs were chosen
▪️There’s enough data for a solid conclusion
What is adjusted R-squared?
Modified to adjust for the number of IVs in the model.
R-squared increases whenever a new IV is added regardless of how informatige it is
What statistic is the better indicator for model selection?
Adjusted R-squared
What do we assume about the relationship of variables for multiple regression inference?
It’s linear
What does a partial residual plot show?
The net relationship between X1 and Y where the influence of other variables is partialled out
What do we assume about the residuals/error terms for multiple regression inference?
They’re approximately normally distributed
How do we plot a partial residual plot?
Residuals of DV against residuals of each IV separately
What is the residual error?
Variation in Y not explained by predictor
How can we see whether the error terms are normally distributed?
▪️Histogram
▪️Normal P-P plot - plotted against theoretical normal distribution
What is homoscedasticity?
▪️Stability in variance of residuals
▪️A scatterplot of standardised residuals and standardised predicted values shows no pattern
▪️Error values have same variance irrespective of x
Observations used for regression models must be ___________
Independent
(can’t use repeated measurements, paired data or matched data)
What assumptions do we have to check for when interpreting multiple linear regression?
▪️Linearity
▪️Normal distribution of residuals
▪️Homoscedasticity (ZPred vs Zresid)
R-squared is a coefficient for what?
Measuring how well the regression line/hyperplane approximates the real data points