Regression Flashcards

1
Q

When is linear regression used?

A

Used when the relationship between variables x and y can be described with a straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Determines the strength of the relationship between x and y but it doesn’t tell us how much y changes based on a given change in x

a. Correlation
b. Regression

A

a. Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define correlation

A

Determines the strength of the relationship between x and y but it doesn’t tell us how much y changes based on a given change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Determines the strength of the relationship between x and y and tells us how much y changes based on a given change in x

a. Correlation
b. Regression

A

b. Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define regression

A

Determines the strength of the relationship between x and y and tells us how much y changes based on a given change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

By proposing a model of the relationship between x and y, regression allows us to …?

A

Estimate how much y will change as a result of a given change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Estimate how much y will change as a result of a given change in x

a. Correlation
b. Regression

A

b. Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Distinguishes between the variable being predicted and the variable(s) used to predict

a. Correlation
b. Regression

A

b. Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or False?

Correlation distinguishes between the variable being predicted and the variable(s) used to predict

A

False

Regression distinguishes between the variable being predicted and the variable(s) used to predict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How many predictor variables are in a simple linear regression?

A

There is only one predictor variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the variable that is being predicted?

a. x
b. y

A

b. y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the outcome variable?

a. x
b. y

A

b. y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the predictor variable?

a. x
b. y

A

a. x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the variable that is used to predict?

a. x
b. y

A

a. x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

y is…?

a. The criterion variable
b. The dependent variable
c. The outcome variable
d. The predictor variable
e. The independent variable
f. The explanatory variable

A

a. The criterion variable
b. The dependent variable
c. The outcome variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

x is…?

a. The predictor variable
b. The dependent variable
c. The independent variable
d. The criterion variable
e. The outcome variable
f. The explanatory variable

A

a. The predictor variable
c. The independent variable
f. The explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why might researchers use regression?

List 3 reasons

A
  1. To investigate the strength of the effect x has on y
  2. To estimate how much y will change as a result of a given change in x
  3. To predict a future value of y, based on a known value of x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Makes the assumption that y is (to some extent) dependent on x

a. Correlation
b. Regression

A

b. Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

True or False?

The dependence of y on x will always reflect causal dependency

A

False

The dependence of y on x may or may not reflect causal dependency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

True or False?

Regression provides direct evidence of causality

A

False

Regression does not provide direct evidence of causality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Linear regression consists of 3 stages

What are they?

A
  1. Analysing the relationship between variables
  2. Proposing a model to explain that relationship
  3. Evaluating the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
  1. Analysing the relationship between variables
  2. Proposing a model to explain that relationship
  3. Evaluating the model

These are stages of…?

a. Regression
b. Correlation
c. ANOVA
d. t-test

A

a. Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The first stage of regression involves analysing the relationship between variables

How do we do this?

A

By determining the strength and direction of the relationship (equivalent to correlation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The second stage of regression involves proposing a model to explain the relationship

How do we do this?

A

By drawing the line of best-fit (regression line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
The third stage of regression involves evaluating the model to explain that relationship How do we do this?
By assessing the goodness of the line of best-fit
26
What is the intercept?
Value of y when x is 0
27
What is the slope?
How much y changes as a result of a 1 unit increase in x
28
How much y changes as a result of a 1 unit increase in x This is known as...? a. The slope b. The intercept
a. The slope
29
Value of y when x is 0 This is known as...? a. The slope b. The intercept
b. The intercept
30
Assumes no relationship between x and y (b=0) a. Best model b. Simplest model
b. Simplest model
31
Based on the relationship between x and y a. Best model b. Simplest model
a. Best model
32
Consists of the regression line a. Best model b. Simplest model
a. Best model
33
Consists of a flat, horizontal line a. Best model b. Simplest model
b. Simplest model
34
What does the simplest model assume?
Assumes no relationship between x and y (b=0)
35
What is the best model based on?
Based on the relationship between x and y
36
How do we calculate the goodness of fit in the simplest model regression?
Refer to the total variance
37
Variance not explained by the mean of y a. Best model b. Simplest model
b. Simplest model
38
Variance not explained by the regression line a. Best model b. Simplest model
b. Simplest model
39
What is the residual variance for the best model?
Variance not explained by the regression line
40
What is the total variance for the simplest model?
Variance not explained by the mean of y
41
How do we calculate the goodness of fit in the best model regression?
Refer to the residual variance
42
Calculate goodness of fit using residual variance a. Best model b. Simplest model
a. Best model
43
Calculate goodness of fit using total variance a. Best model b. Simplest model
b. Simplest model
44
The difference between the observed values of y and the mean of y i.e. the variance in y not explained by the simplest model (b = 0) a. SST b. SSR
a. SST
45
SST is...? a. The difference between the observed values of y and those predicted by the regression line b. The difference between the observed values of y and the mean of y
b. The difference between the observed values of y and the mean of y
46
The difference between the observed values of y and those predicted by the regression line i.e. the variance in y not explained by the regression model a. SST b. SSR
b. SSR
47
What does the difference between SST and SSR reflect?
Reflects the improvement in prediction using the regression model compared to the simplest model i.e. the reduction in unexplained variance using the regression model compared to the simplest model
48
What is the formula to calculate SSM?
SST - SSR = SSM
49
The larger the SSM, the _______ the improvement in prediction using the regression model over the simplest model a. Smaller b. Bigger
b. Bigger
50
The larger the SSM, the bigger the ...?
Improvement in prediction using the regression model over the simplest model
51
The _____ the SSM, the bigger the improvement in prediction using the regression model over the simplest model a. Larger b. Smaller
a. Larger
52
How do we evaluate the improvement due to the model (SSM), relative to the variance the model does not explain (SSR)?
Use an F-test (ANOVA)
53
What do we use an F-test (ANOVA) for when assessing the goodness of fit for a regression?
To evaluate the improvement due to the model (SSM), relative to the variance the model does not explain (SSR)
54
The improvement due to the model is known as...? a. SSM b. SSR
a. SSM
55
The variance the model does not explain is known as...? a. SSM b. SSR
b. SSR
56
Rather than using the Sums of Squares (SS) values, the F-test uses ...?
Mean Squares (MS) values
57
True or False? F-test uses Sums of Squares (SS) values
False F-test uses Mean Squares (MS) values
58
True or False? F-test uses Mean Squares (MS) values, which do not take the degrees of freedom into account
False F-test uses Mean Squares (MS) values, which take the degrees of freedom into account
59
What is the formula for MSM?
MSM = SSM / dfM
60
What is the formula for MSR?
MSR = SSR / dfR
61
Provides a measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model This is known as...?
F-ratio
62
What is an F-ratio in regression?
Provides a measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model
63
F-ratio provides a measure of how much the model has improved the prediction of y, relative to ...?
The level of inaccuracy of the model
64
What is the formula for f-ratio for regression?
F = MSM / MSR
65
If the regression model is good at predicting y (relative to the simplest model, i.e. b = 0), the improvement in prediction due to the model (MSM) will be ______ , while the level of inaccuracy of the model (MSR) will be _____ a. Small. small b. Small, Large c. Large, Large d. Large, small
d. Large, small
66
If the regression model is good at predicting y (relative to the simplest model, i.e. b= 0), the improvement in prediction due to the model (MSM) will be _____ a. Large b. Small c. Medium d. 0
a. Large
67
If the regression model is good at predicting y (relative to the simplest model, i.e. b = 0), the level of inaccuracy of the model (MSR) will be ______ a. Large b. Small c. Medium d. 0
b. Small
68
What does it mean when the F-value is further from 0?
The regression model is good at predicting y
69
What is the null hypothesis for the regression model?
The regression model and the simplest model are equal (in terms of predicting y) MSM = 0
70
What is the formula for the regression equation (predicting y from x)?
y = bx + a or y = slope (x) + intercept
71
What is the formula used for predicting y from x?
y = bx + a or y = slope (x) + intercept
72
Calculate y from x when...? a = 2.76 b = 1.06 x = 6 miles
y = bx + a y = (1.06 * x) + 2.76 y = (1.06 * 6) + 2.76 y = 6.36 + 2.76 y = 9.12
73
True or False? If x and y are negatively correlated the value of b in y = bx + a will be positive
False If x and y are negatively correlated the value of b in y = bx + a will be negative
74
What are the 3 assumptions of regression?
1. Linearity: x and y must be linearly related 2. Absence of outliers 3. Normality, linearity and homoscedasticity, independence of residuals*
75
What is the linearity assumption for regression?
x and y must be linearly The relationship between x and y can be described by a straight line
76
What is the absence of outliers assumption for regression?
Regression, like correlation, is extremely sensitive to outliers It may be appropriate to remove such data points
77
What is the normality of residuals assumption for regression?
Residuals should be normally distributed around the predicted outcome
78
What is the linearity of residuals assumption for regression?
Residuals should have a straight line relationship with the outcome
79
What is the homoscedasticity of residuals assumption for regression?
Variance of residuals about the outcome should be the same for all predicted scores
80
What is the non-parametric equivalent for regression?
There are none Instead, we can attempt a fix
81
What do we check to determine whether the assumptions for regression have been met?
Refer to a scatterplot
82
How do we know the assumptions for regression have been met based on a Normal P-P Plot of Regression Standardized Residual results?
Data points will lie in a reasonably straight diagonal line, from bottom left to top right This would suggest no major deviations from normality
83
Data points will lie in a reasonably straight diagonal line, from bottom left to top right What does this suggest?
No major deviations from normality
84
How do we know the homoscedasticity assumption for regression have been met based on a scatterplot of Regression Standardized Residual results?
Residuals will be roughly rectangularly distributed, with most scores concentrated inthe centre (0) Don’t want to see systematic patterns to residuals (curvilinear, or higher on one side)
85
What are considered outliers on a scatterplot of Regression Standardized Residual results?
Standardised residuals > 3.3 or < -3.3
86
What does R^2 measure?
The proportion of variance explained by the model
87
The proportion of variance explained by the model is measured by...?
R^2
88
What does R measure?
The strength of the relationship between x and y (Equivalent to r if there is only 1 predictor variable
89
The strength of the relationship between x and y (Equivalent to r if there is only 1 predictor variable is measured by...?
R
90
What does adjusted R^2 measure?
Adjusted R2 has been adjusted to account for the degrees of freedom (number of participants and number of parameters being estimated)
91
If we wanted to use the regression model to generalise the results of our sample to the population, which one would we use? a. R b. R^2 c. Adjusted R^2
c. Adjusted R^2
92
Why do we use adjusted R^2 when using the regression model to generalise the results of our sample to the population?
Because R^2 is too optimistic Adjusted R2 has been adjusted to account for the degrees of freedom (number of participants and number of parameters being estimated)
93
How do we report the F-value based on SPSS results?
F(df Regression, df Residual) = F-value, p = p-value
94
Where do we find the intercept (a) and the slope (b) on the SPSS output?
Look at the 'Coefficients' table The intercept (a) is the '(Constant)' row under the 'B' column The slope (b) is the row below the '(Constant)' row under the 'B' column
95
The slope converted to a standardised score is known as...?
Beta
96
What is Beta?
The slope converted to a standardised score
97
What is the t-value equivalent to?
√F when we only have 1 predictor variable
98
An equivalent to √F when we only have 1 predictor variable is known as...?
t-value
99
t-statistic tests the null hypothesis that ...?
The value of b is 0
100
What does the same job as the F-test when we have just one predictor variable?
t-statistic test
101
The amount of variance in y explained by the model (SSM), relative to the total variance in y (SST) This is known as...?
R^2
102
What is R^2?
The amount of variance in y explained by the model (SSM), relative to the total variance in y (SST)
103
What is the formula for R^2?
R^2 = SSM / SST
104
How can we express R^2 as a %?
Multiply by 100
105
If R^2 = .32, we would conclude that the regression model explained ____% of the variance in y
32
106
What does r^2 measure in a correlation?
The proportion of shared variance between two variables
107
In regression, we assume that x _____ the variance in y
Explains
108
r^2 is equivalent to R^2 if ...?
We have only 1 predictor
109
SQRT R^2 = r if ...?
We have only 1 predictor
110
If we have only 1 predictor, how do we calculate r from R^2?
SQRT R^2
111
If we have only 1 predictor, what is r^2 equivalent to?
R^2
112
True or False? In regression, we assume that y explains the variance in x
False In regression, we assume that x explains the variance in y
113
Assesses how much y will change as a result of a given change in x a. Simple linear regression b. Multiple regression
a. Simple linear regression
114
Assess the influence of several predictor variables (e.g. x1, x2, x3 etc…) on the outcome variable (y) a. Simple linear regression b. Multiple regression
b. Multiple regression
115
What does multiple regression allow us to do?
Assess the influence of several predictor variables (e.g. x1, x2, x3 etc…) on the outcome variable (y)
116
We obtain a measure of how much variance in the outcome variable (y) the predictor variables (x1 and x2) combined explain a. Simple linear regression b. Multiple regression
b. Multiple regression
117
How can we obtain a measure of how much variance in the outcome variable (y) the predictor variables (x1 and x2) combined explain?
By constructing a model which incorporates the slopes of each predictor variable
118
We also obtain measures of how much variance in the outcome variable (y) our predictor variables(x1 and x2) explain when considered separately a. Simple linear regression b. Multiple regression
b. Multiple regression
119
What are the 2 things we obtain through multiple regression?
1. A measure of how much variance in the outcome variable (y) the predictor variables (x1 and x2) combined explain 2. Measures of how much variance in the outcome variable (y) our predictor variables(x1 and x2) explain when considered separately
120
What is the formula for the (multiple) regression equation
y = b1x1 + b2x2 + b3x3… + a
121
Proposing a model to explain the relationship between all predictors (x1,x2) and y This is known as...?
Multiple regression
122
What are the 3 stages of multiple regression?
1. Model to explain the relationship between x1 and y Model to explain the relationship between x2 and y etc 2. Multiple regression: Proposing a model to explain the relationship between all predictors (x1,x2) and y 3. Evaluating the model: Goodness-of-fit
122
What are the 5 assumptions of multiple regression?
1. Sufficient sample size 2. Linearity 3. Absence of outliers 4. Multicollinearity 5. Normality, linearity and homoscedasticity, independence of residuals
123
What is the sufficient sample size assumption of a multiple regression?
To look at the combined effect of several predictors: N > 50 + 8M e.g. for 3 predictor variables you need at least 74 Ps To look at the separate effects of several predictors: N > 104 + M e.g. for 3 predictor variables you need at least 107 Ps Too few participants may result in over-optimistic results (results may not be generalisable)
124
What is the linearity assumption of a multiple regression?
Predictor variables should be linearly related to the outcome variable
125
What is the absence of outliers assumption of a multiple regression?
Regression, like correlation, is extremely sensitive to outliers It may be appropriate to remove such data points
126
What is the multicollinearity assumption of a multiple regression?
Ideally, predictor variables will be correlated with the outcome variable but not with one another Check the correlation matrix before performing the regression analysis Predictor variables which are highly correlated with one another (r = .9 and above) are measuring much the same thing It may be appropriate to combine the correlated predictor variables or to remove one
127
What is the normality independence of residuals assumption of a multiple regression?
Residuals should be normally distributed around the predicted outcome
128
What is the linearity independence of residuals assumption of a multiple regression?
Residuals should have a straight line relationship with the outcome
129
What is the homoscedasticity independence of residuals assumption of a multiple regression?
Variance of residuals about the outcome should be the same for all predicted scores
130
Residuals should have a straight line relationship with the outcome a. homoscedasticity independence of residuals b. normality independence of residuals c. linearity independence of residuals
c. linearity independence of residuals
131
Residuals should be normally distributed around the predicted outcome a. homoscedasticity independence of residuals b. normality independence of residuals c. linearity independence of residuals
b. normality independence of residuals
132
Variance of residuals about the outcome should be the same for all predicted scores a. homoscedasticity independence of residuals b. normality independence of residuals c. linearity independence of residuals
a. homoscedasticity independence of residuals
133
How do we check the assumptions for multiple regression have been met? List 2 points
1. Scatterplots 2. Correlation matrix
134
How do we know the assumptions for multiple regression have been met based on a Normal P-P Plot of Regression Standardized Residual results?
Data points will lie in a reasonably straight diagonal line, from bottom left to top right This would suggest no major deviations from normality
135
How do we know the homoscedasticity assumption for multiple regression have been met based on a scatterplot of Regression Standardized Residual results?
Residuals will be roughly rectangularly distributed, with most scores concentrated in the centre (0) Don’t want to see systematic pattern to residuals (curvilinear, or higher on one side)
136
What are considered outliers on a scatterplot of Multiple Regression Standardized Residual results?
Standardised residuals > 3.3 or < -3.3
137
B age = 1.067 naughty list rating = -8.962 Beta age = .492 naughty list rating = -.294 What assumptions can be made with this data?
For every 1 year increase in age, Christmas joy increases by 1.07 points As age increases by 1 SD, Christmas joy increases by 0.49 SDs For every 1 point higher on the naughty list rating, Christmas joy decreases by 8.96 points As naughty list rating increases by 1SD, Christmas joy decreases by 0.29 SDs
138
B age = 1.067 naughty list rating = -8.962 Beta age = .492 naughty list rating = -.294 a= 62.612 How much Christmas joy would you predict for a 41 year-old who scores 5.8 on Santa’s naughty list for 2020?
y = b1x1 + b2x2 + a y = (1.067 * age) + (-8.962 * naughty) + 62.612 y = (1.067 * 41) + (-8.962 * 5.8) + 62.612 ŷ = (43.747) + (-51.9796) + 62.612 y = 54.3794
139
Assesses the significance of each predictor separately This is known as...?
t-values
140
What do the t-values in multiple regression output on SPSS tell us?
How much each individual predictor, separately, improves the prediction of y
141
How much each individual predictor, separately, improves the prediction of y a. P-value b. F-value c. t-value d. M-value
c. t-value
142
What is the null hypothesis of multiple regression model?
The predictor and the simplest model are equal