Practical 7: Multiple linear regression Flashcards
How does multiple regression model plot two or more explanatory variables?
Rather than a regression line on a scatter plot - a regression plane is used
When is multiple linear regression used?
To see how a set of independent variables relates to one dependent variable
What is a partial regression coefficient?
Bi is the change in y when xi increases by unit with all other independent variables held static
In MLR - the independent/explanatory/predictors/covariates x1, x2 can be continuous or categorical
y the dependent variable is continuous - a linear model is used. If y is categorical another model needs to be used
What is the null hypothesis in MLR?
When holding all other variables constant there is no linear association between y and Xi (i.e. B1 = 0)
If p is < 0.05 we reject this and conclude that B1 is significant at the population level
If a B1 is not significant what can we conclude?
That the independent variable in relation to B1 is not statistically significantly related to the dependent variable when the other variables are held constant
- we cannot generalise this variable to the population
How do we see if there is confounding?
Plot a scatterplot between independent variables –> is there a linear relationship between them
Confounding is when one independent variable affects both the other relationship variable and the dependent variable
When not taken into account confounders may introduce bias into the result of the slope
How do we adjust for confounders?
Include them in the multiple linear regression model
How do we adjust for confounders?
Include them in the multiple linear regression model
How can sample restrict the number of variables included in a MLR?
Generally we need at least 10 observations per independent variable
i.e. if 100 observations we can have 10 independent variable?
What does a R-squared value of 0 indicate?
0 - R-square indicates the plane is horizontal and doesn’t fit the data points –> the model doesn’t explain the variables well
1 would be the opposite
(R-squared is the proportion of variance explained in the dependent variable is “explained” by the independent variable”)
Does R-square indicate if one variable is explained by another?
No R-square just explains how much variance of dependent variable is accounted for by our model (the independent variables)
- Also doesn’t indicate if the correct model was used
How does Adjusted R-square differ to R-square?
Adjusted R square accounts for chance increases seen in R-squared when another variable is added
Therefore its a more robust value of estimating how much the variance in dependent variable is explained by our model.
What are the assumptions of MLR
- The relationship between Y and each CONTINUOUS independent variable is linear
- First plot scatterplot of residuals of dependent variable against residuals of each independent variable in turn (both variables are regressed separately on each x)
- Partial residual plots are conducted to show the influence between y and specific x with the effects of other x’s removed - requires at least two independent variables - Residuals or error terms should be normally distributed - normality of residual error
- Assess this by plot a histogram of error terms
- Or by using a p-p plot –> this plots the data against a theoretical normal distribution (i.e. the p-p plot should have a straight line)
- MLR assumes normality of residual error - i.e the variation in residual error of Y is not explained by predictors
- Homoscedasticity (stability in variance of residuals)
The variance of error terms doesn’t depend on value of x
Scatter plot of standardised residuals and standardised predicted values - no pattern - i.e as x increases standardised residuals do not increase
- Independent observations (i.e. not couples etc.)
How do we assess the assumptions of MLR in SPSS?
Analyse - regression - linear - define terms
PLOTS –> histogram + normal probability plot + produce all partial plots
- ZRESID (standardised residuals) in Y
- ZPRED (standardised predicted values) in X
How do we check assumption 1 for MLR (linearity between continuous predictor and Y)?
1# Check the partial residual plots (the scatter plots illustrating the relationship between residual of Y and each residual of X) –> for a variable to be included in the model the scatter plot of that variable should show a linear relationship