Regression Flashcards
When is linear regression used?
Used when the relationship between variables x and y can be described with a straight line
Determines the strength of the relationship between x and y but it doesn’t tell us how much y changes based on a given change in x
a. Correlation
b. Regression
a. Correlation
Define correlation
Determines the strength of the relationship between x and y but it doesn’t tell us how much y changes based on a given change in x
Determines the strength of the relationship between x and y and tells us how much y changes based on a given change in x
a. Correlation
b. Regression
b. Regression
Define regression
Determines the strength of the relationship between x and y and tells us how much y changes based on a given change in x
By proposing a model of the relationship between x and y, regression allows us to …?
Estimate how much y will change as a result of a given change in x
Estimate how much y will change as a result of a given change in x
a. Correlation
b. Regression
b. Regression
Distinguishes between the variable being predicted and the variable(s) used to predict
a. Correlation
b. Regression
b. Regression
True or False?
Correlation distinguishes between the variable being predicted and the variable(s) used to predict
False
Regression distinguishes between the variable being predicted and the variable(s) used to predict
How many predictor variables are in a simple linear regression?
There is only one predictor variable
What is the variable that is being predicted?
a. x
b. y
b. y
What is the outcome variable?
a. x
b. y
b. y
What is the predictor variable?
a. x
b. y
a. x
What is the variable that is used to predict?
a. x
b. y
a. x
y is…?
a. The criterion variable
b. The dependent variable
c. The outcome variable
d. The predictor variable
e. The independent variable
f. The explanatory variable
a. The criterion variable
b. The dependent variable
c. The outcome variable
x is…?
a. The predictor variable
b. The dependent variable
c. The independent variable
d. The criterion variable
e. The outcome variable
f. The explanatory variable
a. The predictor variable
c. The independent variable
f. The explanatory variable
Why might researchers use regression?
List 3 reasons
- To investigate the strength of the effect x has on y
- To estimate how much y will change as a result of a given change in x
- To predict a future value of y, based on a known value of x
Makes the assumption that y is (to some extent) dependent on x
a. Correlation
b. Regression
b. Regression
True or False?
The dependence of y on x will always reflect causal dependency
False
The dependence of y on x may or may not reflect causal dependency
True or False?
Regression provides direct evidence of causality
False
Regression does not provide direct evidence of causality
Linear regression consists of 3 stages
What are they?
- Analysing the relationship between variables
- Proposing a model to explain that relationship
- Evaluating the model
- Analysing the relationship between variables
- Proposing a model to explain that relationship
- Evaluating the model
These are stages of…?
a. Regression
b. Correlation
c. ANOVA
d. t-test
a. Regression
The first stage of regression involves analysing the relationship between variables
How do we do this?
By determining the strength and direction of the relationship (equivalent to correlation)
The second stage of regression involves proposing a model to explain the relationship
How do we do this?
By drawing the line of best-fit (regression line)
The third stage of regression involves evaluating the model to explain that relationship
How do we do this?
By assessing the goodness of the line of best-fit
What is the intercept?
Value of y when x is 0
What is the slope?
How much y changes as a result of a 1 unit increase in x
How much y changes as a result of a 1 unit increase in x
This is known as…?
a. The slope
b. The intercept
a. The slope
Value of y when x is 0
This is known as…?
a. The slope
b. The intercept
b. The intercept
Assumes no relationship between x and y (b=0)
a. Best model
b. Simplest model
b. Simplest model
Based on the relationship between x and y
a. Best model
b. Simplest model
a. Best model
Consists of the regression line
a. Best model
b. Simplest model
a. Best model
Consists of a flat, horizontal line
a. Best model
b. Simplest model
b. Simplest model
What does the simplest model assume?
Assumes no relationship between x and y (b=0)
What is the best model based on?
Based on the relationship between x and y
How do we calculate the goodness of fit in the simplest model regression?
Refer to the total variance
Variance not explained by the mean of y
a. Best model
b. Simplest model
b. Simplest model
Variance not explained by the regression line
a. Best model
b. Simplest model
b. Simplest model
What is the residual variance for the best model?
Variance not explained by the regression line
What is the total variance for the simplest model?
Variance not explained by the mean of y
How do we calculate the goodness of fit in the best model regression?
Refer to the residual variance
Calculate goodness of fit using residual variance
a. Best model
b. Simplest model
a. Best model
Calculate goodness of fit using total variance
a. Best model
b. Simplest model
b. Simplest model
The difference between the observed values of y and the mean of y
i.e. the variance in y not explained by the simplest model (b = 0)
a. SST
b. SSR
a. SST
SST is…?
a. The difference between the observed values of y and those predicted by the regression line
b. The difference between the observed values of y and the mean of y
b. The difference between the observed values of y and the mean of y
The difference between the observed values of y and those predicted by the regression line
i.e. the variance in y not explained by the regression model
a. SST
b. SSR
b. SSR
What does the difference between SST and SSR reflect?
Reflects the improvement in prediction using
the regression model compared to the simplest model
i.e. the reduction in unexplained variance using the regression model compared
to the simplest model
What is the formula to calculate SSM?
SST - SSR = SSM
The larger the SSM, the _______ the improvement in prediction using the regression model over the simplest model
a. Smaller
b. Bigger
b. Bigger
The larger the SSM, the bigger the …?
Improvement in prediction using the regression model over the simplest model
The _____ the SSM, the bigger the improvement in prediction using the regression model over the simplest model
a. Larger
b. Smaller
a. Larger
How do we evaluate the improvement due to the model (SSM), relative to the variance the model does not explain (SSR)?
Use an F-test (ANOVA)
What do we use an F-test (ANOVA) for when assessing the goodness of fit for a regression?
To evaluate the improvement due to the model
(SSM), relative to the variance the model does not explain (SSR)
The improvement due to the model is known as…?
a. SSM
b. SSR
a. SSM
The variance the model does not explain is known as…?
a. SSM
b. SSR
b. SSR
Rather than using the Sums of Squares (SS) values, the F-test uses …?
Mean Squares (MS) values
True or False?
F-test uses Sums of Squares (SS) values
False
F-test uses Mean Squares (MS) values
True or False?
F-test uses Mean Squares (MS) values, which do not take the degrees of freedom into account
False
F-test uses Mean Squares (MS) values, which take the degrees of freedom into account
What is the formula for MSM?
MSM = SSM / dfM
What is the formula for MSR?
MSR = SSR / dfR
Provides a measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model
This is known as…?
F-ratio
What is an F-ratio in regression?
Provides a measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model
F-ratio provides a measure of how much the model has improved the prediction of y, relative to …?
The level of inaccuracy of the model
What is the formula for f-ratio for regression?
F = MSM / MSR
If the regression model is good at predicting y (relative to the simplest model, i.e. b = 0), the improvement in prediction due to the model (MSM) will be ______ , while the level of inaccuracy of the model (MSR) will be _____
a. Small. small
b. Small, Large
c. Large, Large
d. Large, small
d. Large, small
If the regression model is good at predicting y (relative to the simplest model, i.e. b= 0), the improvement in prediction due to the model (MSM) will be _____
a. Large
b. Small
c. Medium
d. 0
a. Large
If the regression model is good at predicting y (relative to the simplest model, i.e. b = 0), the
level of inaccuracy of the model (MSR) will be ______
a. Large
b. Small
c. Medium
d. 0
b. Small
What does it mean when the F-value is further from 0?
The regression model is good at predicting y
What is the null hypothesis for the regression model?
The regression model and the simplest model are equal (in terms of predicting y)
MSM = 0
What is the formula for the regression equation (predicting y from x)?
y = bx + a
or
y = slope (x) + intercept
What is the formula used for predicting y from x?
y = bx + a
or
y = slope (x) + intercept
Calculate y from x when…?
a = 2.76
b = 1.06
x = 6 miles
y = bx + a
y = (1.06 * x) + 2.76
y = (1.06 * 6) + 2.76
y = 6.36 + 2.76
y = 9.12
True or False?
If x and y are negatively correlated the value of b in y = bx + a will be positive
False
If x and y are negatively correlated the value of b in y = bx + a will be negative
What are the 3 assumptions of regression?
- Linearity: x and y must be linearly related
- Absence of outliers
- Normality, linearity and homoscedasticity, independence of residuals*
What is the linearity assumption for regression?
x and y must be linearly
The relationship between x and y can be described by a straight line
What is the absence of outliers assumption for regression?
Regression, like correlation, is extremely sensitive to outliers
It may be appropriate to remove such data points
What is the normality of residuals assumption for regression?
Residuals should be normally distributed around the predicted outcome
What is the linearity of residuals assumption for regression?
Residuals should have a straight line relationship with the outcome
What is the homoscedasticity of residuals assumption for regression?
Variance of residuals about the outcome should be the same for all predicted scores
What is the non-parametric equivalent for regression?
There are none
Instead, we can attempt a fix
What do we check to determine whether the assumptions for regression have been met?
Refer to a scatterplot
How do we know the assumptions for regression have been met based on a Normal P-P Plot of Regression Standardized Residual results?
Data points will lie in a reasonably straight diagonal
line, from bottom left to top right
This would suggest no major deviations from normality
Data points will lie in a reasonably straight diagonal
line, from bottom left to top right
What does this suggest?
No major deviations from normality
How do we know the homoscedasticity assumption for regression have been met based on a scatterplot of Regression Standardized Residual results?
Residuals will be roughly rectangularly
distributed, with most scores concentrated inthe centre (0)
Don’t want to see systematic patterns to
residuals (curvilinear, or higher on one side)
What are considered outliers on a scatterplot of Regression Standardized Residual results?
Standardised residuals > 3.3 or < -3.3
What does R^2 measure?
The proportion of variance explained by the model
The proportion of variance explained by the model is measured by…?
R^2
What does R measure?
The strength of the relationship between x and y
(Equivalent to r if there is only 1 predictor variable
The strength of the relationship between x and y
(Equivalent to r if there is only 1 predictor variable is measured by…?
R
What does adjusted R^2 measure?
Adjusted R2 has been adjusted to account for the degrees of freedom (number of participants and number of parameters being estimated)
If we wanted to use the regression model to generalise the results of our sample to the population, which one would we use?
a. R
b. R^2
c. Adjusted R^2
c. Adjusted R^2
Why do we use adjusted R^2 when using the regression model to generalise the results of our sample to the population?
Because R^2 is too optimistic
Adjusted R2 has been adjusted to account for the degrees of freedom (number of participants and number of parameters being estimated)
How do we report the F-value based on SPSS results?
F(df Regression, df Residual) = F-value, p = p-value
Where do we find the intercept (a) and the slope (b) on the SPSS output?
Look at the ‘Coefficients’ table
The intercept (a) is the ‘(Constant)’ row under the ‘B’ column
The slope (b) is the row below the ‘(Constant)’ row under the ‘B’ column
The slope converted to a standardised score is known as…?
Beta
What is Beta?
The slope converted to a standardised score
What is the t-value equivalent to?
√F when we only have 1 predictor variable
An equivalent to √F when we only have 1 predictor variable is known as…?
t-value
t-statistic tests the null hypothesis that …?
The value of b is 0
What does the same job as the F-test when we have just one predictor variable?
t-statistic test
The amount of variance in y explained by the model (SSM), relative to the total variance in y (SST)
This is known as…?
R^2
What is R^2?
The amount of variance in y explained by the model (SSM), relative to the total variance in y (SST)
What is the formula for R^2?
R^2 = SSM / SST
How can we express R^2 as a %?
Multiply by 100
If R^2 = .32, we would conclude that the regression model explained ____% of the variance in y
32
What does r^2 measure in a correlation?
The proportion of shared variance between two variables
In regression, we assume that x _____ the variance in y
Explains
r^2 is equivalent to R^2 if …?
We have only 1 predictor
SQRT R^2 = r if …?
We have only 1 predictor
If we have only 1 predictor, how do we calculate r from R^2?
SQRT R^2
If we have only 1 predictor, what is r^2 equivalent to?
R^2
True or False?
In regression, we assume that y explains the variance in x
False
In regression, we assume that x explains the
variance in y
Assesses how much y will change as a
result of a given change in x
a. Simple linear regression
b. Multiple regression
a. Simple linear regression
Assess the influence of several predictor variables (e.g. x1, x2, x3 etc…) on the outcome variable (y)
a. Simple linear regression
b. Multiple regression
b. Multiple regression
What does multiple regression allow us to do?
Assess the influence of several predictor variables (e.g. x1, x2, x3 etc…) on the outcome variable (y)
We obtain a measure of how much variance in the outcome variable (y) the predictor variables (x1
and x2) combined explain
a. Simple linear regression
b. Multiple regression
b. Multiple regression
How can we obtain a measure of how much variance in the outcome variable (y) the predictor variables (x1
and x2) combined explain?
By constructing a model which incorporates the slopes of each predictor variable
We also obtain measures of how much variance in the outcome variable (y) our predictor variables(x1
and x2) explain when considered separately
a. Simple linear regression
b. Multiple regression
b. Multiple regression
What are the 2 things we obtain through multiple regression?
- A measure of how much variance in the outcome variable (y) the predictor variables (x1 and x2) combined explain
- Measures of how much variance in the outcome variable (y) our predictor variables(x1 and x2) explain when considered separately
What is the formula for the (multiple) regression equation
y = b1x1 + b2x2 + b3x3… + a
Proposing a model to explain the relationship between all predictors (x1,x2) and y
This is known as…?
Multiple regression
What are the 3 stages of multiple regression?
- Model to explain the relationship between x1
and y
Model to explain the relationship between x2
and y etc
- Multiple regression: Proposing a model to explain
the relationship between all predictors (x1,x2) and y - Evaluating the model: Goodness-of-fit
What are the 5 assumptions of multiple regression?
- Sufficient sample size
- Linearity
- Absence of outliers
- Multicollinearity
- Normality, linearity and homoscedasticity, independence of residuals
What is the sufficient sample size assumption of a multiple regression?
To look at the combined effect of several predictors:
N > 50 + 8M
e.g. for 3 predictor variables you need at least 74 Ps
To look at the separate effects of several predictors:
N > 104 + M
e.g. for 3 predictor variables you need at least 107 Ps
Too few participants may result in over-optimistic results (results may not be generalisable)
What is the linearity assumption of a multiple regression?
Predictor variables should be linearly related to the
outcome variable
What is the absence of outliers assumption of a multiple regression?
Regression, like correlation, is extremely sensitive to outliers
It may be appropriate to remove such data points
What is the multicollinearity assumption of a multiple regression?
Ideally, predictor variables will be correlated with the outcome variable but not with one another
Check the correlation matrix before performing the regression analysis
Predictor variables which are highly correlated with one another (r = .9 and above) are measuring much the same thing
It may be appropriate to combine the correlated predictor variables or to remove one
What is the normality independence of residuals assumption of a multiple regression?
Residuals should be normally distributed around the predicted outcome
What is the linearity independence of residuals assumption of a multiple regression?
Residuals should have a straight line relationship with the outcome
What is the homoscedasticity independence of residuals assumption of a multiple regression?
Variance of residuals about the outcome should be the same for all predicted scores
Residuals should have a straight line relationship with the outcome
a. homoscedasticity independence of residuals
b. normality independence of residuals
c. linearity independence of residuals
c. linearity independence of residuals
Residuals should be normally distributed around the predicted outcome
a. homoscedasticity independence of residuals
b. normality independence of residuals
c. linearity independence of residuals
b. normality independence of residuals
Variance of residuals about the outcome should be the same for all predicted scores
a. homoscedasticity independence of residuals
b. normality independence of residuals
c. linearity independence of residuals
a. homoscedasticity independence of residuals
How do we check the assumptions for multiple regression have been met?
List 2 points
- Scatterplots
- Correlation matrix
How do we know the assumptions for multiple regression have been met based on a Normal P-P Plot of Regression Standardized Residual results?
Data points will lie in a reasonably straight diagonal
line, from bottom left to top right
This would suggest no major deviations from normality
How do we know the homoscedasticity assumption for multiple regression have been met based on a scatterplot of Regression Standardized Residual results?
Residuals will be roughly rectangularly
distributed, with most scores concentrated in the centre (0)
Don’t want to see systematic pattern to
residuals (curvilinear, or higher on one side)
What are considered outliers on a scatterplot of Multiple Regression Standardized Residual results?
Standardised residuals > 3.3 or < -3.3
B
age = 1.067
naughty list rating = -8.962
Beta
age = .492
naughty list rating = -.294
What assumptions can be made with this data?
For every 1 year increase in age, Christmas joy increases by 1.07 points
As age increases by 1 SD, Christmas joy increases by 0.49 SDs
For every 1 point higher on the naughty list rating, Christmas joy decreases by 8.96 points
As naughty list rating increases by 1SD, Christmas joy decreases by 0.29 SDs
B
age = 1.067
naughty list rating = -8.962
Beta
age = .492
naughty list rating = -.294
a= 62.612
How much Christmas joy would you predict for a 41 year-old who scores 5.8 on Santa’s naughty list for 2020?
y = b1x1 + b2x2 + a
y = (1.067 * age) + (-8.962 * naughty) + 62.612
y = (1.067 * 41) + (-8.962 * 5.8) + 62.612
ŷ = (43.747) + (-51.9796) + 62.612
y = 54.3794
Assesses the significance of each predictor separately
This is known as…?
t-values
What do the t-values in multiple regression output on SPSS tell us?
How much each individual predictor, separately, improves the prediction of y
How much each individual predictor, separately, improves the prediction of y
a. P-value
b. F-value
c. t-value
d. M-value
c. t-value
What is the null hypothesis of multiple regression model?
The predictor and the simplest model are equal