Lecture 11 Flashcards
When is it appropriate to use multiple regression? What does it tell us?
MR examines the degree of linear association between a set of IVs on ONE DV.
MR allows us to test hypotheses about which specific IVs contribute to variation in the DV. It is appropriate to analyse linear relationships only.
Give a clinical example of multiple regression. How is multiple regression clinically important?
Which factors (IVs) predict the amount of improvement (DV) after therapy? Examples of IVs include: age, number of sessions, gender, parental involvement, severity of impairment.
Do these factors together predict level of improvement, and which factors contribute significantly to the prediction?
This type of research is important in determining which variables moderate the treatment effect, so as to plan therapy that maximises the treatment outcome.
What is the general model for simple linear regression?
Y = C + BX + Error
Y’ = C + BX
(where Y’ is the predicted value of the DV if the linear model was 100% perfect and contained no error).
Error is the residual term (i.e. the part of Y not explained by the regression model, i.e. Y - Y’).
What is the aim of regression analysis?
To determine a linear equation that minimises the sum of squared deviations of the residuals (Y - Y’)
What does R represent?
The strength of the relationship between the set of predictors (Xs) and DV (Y).
It varies between 0 and 1: When R is 0 there is no linear relationship, and when R is 1 there is a perfect linear relationship.
What does R squared represent?
The proportion (as a percentage) of variance in Y that is explained by the DV.
What are four requirements that need to be examined before multiple regression takes place?
- Is the relationship linear?
- Is the relationship positive or negative?
- Is the relationship weak or strong? (indicated by R-squared)
- Are there any outliers?
How do you know if there are outliers?
The standardised residuals should be normally distributed with a mean of 0, standard deviation of 1, and a range of around 2 to –2.
Standardised residuals exceeding this range (e.g., > 3 or < -3) are more than 3 standard deviations from the mean and may be classed as outliers, skewing the distribution and having an impact on the regression coefficients.
What is partial correlation?
Partial correlation is a way of examining the (linear) correlation between two variables, X1 and Y, while “controlling” for some other variable (e.g., X2).
It reflects the amount of variance between those two variables that is unique to those variables (and is therefore not also shared with the control variables).
Give a clinical example of partial correlation
Research question: Is age (IV) related to recovery from stroke (DV)?
- Seems to be a significant linear relationship between the two variables. However, severity of stroke also increases with age, and therefore severity may be a confounding variable/covariate.
- After controlling for severity, “age” is no longer significantly related to amount of recovery.
- Therefore, any effect that “age” has on “recovery” is also shared/explained by “severity”.
What is the general linear model in multiple regression?
Y = C + B1X1 + B2X2 + ... + BkXk + Error Y' = C + B1X1 + B2X2 + ... + BkXk
What does multiple regression assume about residuals?
The analysis assumes the residuals have a normal distribution with mean = 0
How does multiple regression minimise the error of prediction?
It determines the Y intercept and the optimal weights (regression coefficients) for each IV or predictor (X) so that sum of squared residuals is a minimum (and that scores, or DVs, are clustered closely around the linear line).
What is the null hypothesis in multiple regression analysis?
That all regression coefficients are 0 (i.e. that there is no linear relationship between any of the variables)
What can we conclude from a significant F ratio in multiple regression analysis?
That at least one of the IVs has a linear relationship with the DV. But the statistical test is like an omnibus test, it doesn’t tell us which IVs are significantly (and linearly) related to the DV.