Multiple linear regression Flashcards
Define
Unstandardised coefficient
represents the expected change that one unit change of the IV will have on the DV. Hard to interpret magnitude
Define
Standardised coefficient (β)
represents the expected standard deviation change in the DV, for a standard deviation change in the predictor
Define
Adjusted R^2
gives an estimate of R2 in the population taking into account the fact that the regression model might overfit your particular dataset
Define
Sampling error
the discrepancy between sample and population
Define
R^2 shrinkage
failure to replicate the R^2 on subsequent regression models
Define
Cohen’s f2
a measure of effect size based on R^2; tell you the unique effect of one predictor variable
Define
Homoscedasticity
all residuals in the model have equal variances
Define
Heteroscedasticity
all residuals in the model have unequal variances
Define
Multiple regression
a type of regression that is used to predict values of an outcome from several predictors
Define
Multicollinearity
the presence of high intercorrelations between predictor variables
Define
F-ratio
The overall significance of the regression equation; a significant value indicates that the equation predicts a significant proportion of the variability in the Y scores (i.e., more than would be expected by chance alone)
Define
Dummy coding
the process of coding a categorical variable into dichotomous variables
Define
Semi-partial (part) correlation
Measures the relationship between two variables, controlling for the effect that a third variable has on one of the others
Definition
represents the expected change that one unit change of the IV will have on the DV. Hard to interpret magnitude
Unstandardised coefficient
Definition
represents the expected standard deviation change in the DV, for a standard deviation change in the predictor
Standardised coefficient (β)
Definition
gives an estimate of R2 in the population taking into account the fact that the regression model might overfit your particular dataset
Adjusted R^2
Definition
the discrepancy between sample and population
Sampling error
Definition
failure to replicate the R^2 on subsequent regression models
R^2 shrinkage
Definition
a measure of effect size based on R^2; tell you the unique effect of one predictor variable
Cohen’s f2
Definition
all residuals in the model have equal variances
Homoscedasticity
Definition
all residuals in the model have unequal variances
Heteroscedasticity
Definition
a type of regression that is used to predict values of an outcome from several predictors
Multiple regression
Definition
the presence of high intercorrelations between predictor variables
Multicollinearity
Definition
The overall significance of the regression equation; a significant value indicates that the equation predicts a significant proportion of the variability in the Y scores (i.e., more than would be expected by chance alone)
F-ratio
Definition
the process of coding a categorical variable into dichotomous variables
Dummy coding
Definition
Measures the relationship between two variables, controlling for the effect that a third variable has on one of the others
Semi-partial (part) correlation
What does the unstandardised coefficient of this relationship mean?

-1.584 indicates that a 1 unit change in self-esteem is predicted to be associated with a 1.584 decrease in loneliness
Why is it hard to interpret the magnitude of the effect unstandardised coefficients?
Coefficients depend on the scale of the predictors and the DVs
If self-esteem ranges from 0 to 2, a 1 unit change is a huge difference; if the range is 0 to 100, a 1 unit change is very small
What does the standardised beta coefficient tell you about this relationship?

In the current regression, β = -.257, meaning that a 1 SD unit increase in self-esteem is expected to predict a .257 decrease in loneliness
Imagine that the variance of the DV is 5, and after calculating the regression, the variance of the residuals is 4. What is the model explained variance?
5 - 4 = 1
What is the formula for R2?
Imagine that the variance of the DV is 5, and after calculating the regression, the variance of the residuals is 4. What is R2?
R2 = MEV/Total variance
Model explained variance (MEV) = 5 - 4 = 1
Total variance = 5
R2 = 1 / 5 = 0.2 = 20%
What is considered a small, medium and large R2?
There are no fixed rules for what a small, medium, or large R2 is, but many people consider .04 (4%) as small, .09 (9%) as medium, and .25 (25%) as large
What is an adjusted R2? What does it consider?
Gives an estimate of R2 in the population. It takes into account sample size, the number of predictors, etc…
Sampling error increases as sample size _________and as the number of predictors ___________
Sampling error increases as sample size decreases and as the number of predictors increase
Why is R2 likely to overestimate the size of the effect in the population?
Sampling error increases as sample size decreases and as the number of predictors increase
Shrinkage is best evaluated using a ______________
Shrinkage is best evaluated using a cross validation study
What is it called when a regression model does not work with a new data set?
Shrinkage
What happens if there is a large discrepancy between R2 and adjusted R2?
it indicates the regression model does not generalise well to the population
When is Cohen’s f2 useful?
Most useful when using a variant for multiple regression
Age accounted for 10% of the variance in loneliness on its own
Age + self-esteem combined accounted for 15% of variance in loneliness
What is the f2 for self-esteem?
Difference between these two is self-esteem unique variance (15% - 10% = 5%)
f2 for the unique effect of self-esteem would be:
𝑓2 = (.15 − .10) / (1 − .15) = .05 / .85 = .059

What assumptions must be met to run a linear regression?
- Outcome variable must be continuous; predictors can be continuous or dichotomous
- Predictors must not have zero variance
- Independence
- Linearity
- Independent errors
- Normally-distributed errors
- Homoscedasticity
What type of plot can we use to look for non-linearity in regression? What indicates that the assumption has been met?
Partial regression plot (i.e. residual and predicted)
The absence of clear pattern, means assumption has been met
How do we deal with non-linear data in regression?
What does the independent errors assumption mean?
For any 2 observations, the errors or residual terms should be independent (i.e., uncorrelated) with one another
Observations should be independent and there should not be any systematic relationship amongst the residuals
What does the normally-distributed errors assumption mean?
Residuals for the regression model should be random and normally distributed, with mean = 0
Note: this does not mean the predictors have to be normally distributed – predictors do not need to be normally distributed (although, it does improve the chances of this assumption being met)

What assumption does this scatterplot violate?

Homoscedasticity
What should you report from a linear regression?
- How you assessed the assumptions and whether they appear violated
- Any attempts to address violated assumptions
- Whether these attempts resolves assumption issues or if they were still violated
- Descriptive statistics (e.g. mean and SD/median and IQR/frequency and percent)
- Standardised coefficient, effect size, R2, confidence intervals and p-value
How is the mulitple regression equation different to the linear regression one?
Add bkXk, where k = number of predictor variable included
In multiple regression bk captures the unique relationship, conditional (adjusted) for all other predictors in the model
Which variable is the best predictor?

The β value with the largest magnitude (ignoring the +/- signs) is the best predictor
Thus, self-esteem is the best predictor of loneliness – indeed, it is the only significant predictor
What does this R2 of a multiple regression model tell you? What does it mean for this example?

R Square (R2 ) tells you the proportion of variability in the outcome variable that is accounted for by the predictor variables
In our example, 7% of the variability in loneliness is accounted for by self-esteem and number of exercise days (when considered together)
What statistics do we use to measure multicollinearity?
Tolerance: predictor variances with tolerances < .10 are multicollinear with 1+ other predictors, which is concerning (i.e., you have a multicollinearity problem)
VIF: VIF = 1/tolerance; predictor variables with VIF > 10 are multicollinear with 1+ other predictors, which is concerning (i.e., you have a multicollinearity problem)
What extra assumption does multiple regression have that linear regression doesn’t?
Multicollinearity
What should you do if you have an issue with multicollinearity?
Consider deleting one of the offending variables 2
Combine the variables with high intercorrelations into a single measure
What does the semi-partial correlations tell us in this example?

The variability in loneliness uniquely accounted for by self-esteem = (-.261)2 * 100 = 6.81%
The variability in loneliness uniquely accounted for by exercise = (.058)2 * 100 = 0.34%
The overall significance of the regression equation can be evaluated by computing an ________
The overall significance of the regression equation can be evaluated by computing an F-ratio
What does a significant F-ratio indicate?
A significant F-ration indicates that the equation predicts a significant proportion of the variability in the Y scores (i.e., more than would be expected by chance alone)