Regression Flashcards

1
Q

When is linear regression used?

A

Used when the relationship between variables x and y can be described with a straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Determines the strength of the relationship between x and y but it doesn’t tell us how much y changes based on a given change in x

a. Correlation
b. Regression

A

a. Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define correlation

A

Determines the strength of the relationship between x and y but it doesn’t tell us how much y changes based on a given change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Determines the strength of the relationship between x and y and tells us how much y changes based on a given change in x

a. Correlation
b. Regression

A

b. Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define regression

A

Determines the strength of the relationship between x and y and tells us how much y changes based on a given change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

By proposing a model of the relationship between x and y, regression allows us to …?

A

Estimate how much y will change as a result of a given change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Estimate how much y will change as a result of a given change in x

a. Correlation
b. Regression

A

b. Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Distinguishes between the variable being predicted and the variable(s) used to predict

a. Correlation
b. Regression

A

b. Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or False?

Correlation distinguishes between the variable being predicted and the variable(s) used to predict

A

False

Regression distinguishes between the variable being predicted and the variable(s) used to predict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How many predictor variables are in a simple linear regression?

A

There is only one predictor variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the variable that is being predicted?

a. x
b. y

A

b. y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the outcome variable?

a. x
b. y

A

b. y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the predictor variable?

a. x
b. y

A

a. x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the variable that is used to predict?

a. x
b. y

A

a. x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

y is…?

a. The criterion variable
b. The dependent variable
c. The outcome variable
d. The predictor variable
e. The independent variable
f. The explanatory variable

A

a. The criterion variable
b. The dependent variable
c. The outcome variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

x is…?

a. The predictor variable
b. The dependent variable
c. The independent variable
d. The criterion variable
e. The outcome variable
f. The explanatory variable

A

a. The predictor variable
c. The independent variable
f. The explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why might researchers use regression?

List 3 reasons

A
  1. To investigate the strength of the effect x has on y
  2. To estimate how much y will change as a result of a given change in x
  3. To predict a future value of y, based on a known value of x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Makes the assumption that y is (to some extent) dependent on x

a. Correlation
b. Regression

A

b. Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

True or False?

The dependence of y on x will always reflect causal dependency

A

False

The dependence of y on x may or may not reflect causal dependency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

True or False?

Regression provides direct evidence of causality

A

False

Regression does not provide direct evidence of causality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Linear regression consists of 3 stages

What are they?

A
  1. Analysing the relationship between variables
  2. Proposing a model to explain that relationship
  3. Evaluating the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
  1. Analysing the relationship between variables
  2. Proposing a model to explain that relationship
  3. Evaluating the model

These are stages of…?

a. Regression
b. Correlation
c. ANOVA
d. t-test

A

a. Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The first stage of regression involves analysing the relationship between variables

How do we do this?

A

By determining the strength and direction of the relationship (equivalent to correlation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The second stage of regression involves proposing a model to explain the relationship

How do we do this?

A

By drawing the line of best-fit (regression line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

The third stage of regression involves evaluating the model to explain that relationship

How do we do this?

A

By assessing the goodness of the line of best-fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the intercept?

A

Value of y when x is 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the slope?

A

How much y changes as a result of a 1 unit increase in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How much y changes as a result of a 1 unit increase in x

This is known as…?

a. The slope
b. The intercept

A

a. The slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Value of y when x is 0

This is known as…?

a. The slope
b. The intercept

A

b. The intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Assumes no relationship between x and y (b=0)

a. Best model
b. Simplest model

A

b. Simplest model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Based on the relationship between x and y

a. Best model
b. Simplest model

A

a. Best model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Consists of the regression line

a. Best model
b. Simplest model

A

a. Best model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Consists of a flat, horizontal line

a. Best model
b. Simplest model

A

b. Simplest model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What does the simplest model assume?

A

Assumes no relationship between x and y (b=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the best model based on?

A

Based on the relationship between x and y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How do we calculate the goodness of fit in the simplest model regression?

A

Refer to the total variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Variance not explained by the mean of y

a. Best model
b. Simplest model

A

b. Simplest model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Variance not explained by the regression line

a. Best model
b. Simplest model

A

b. Simplest model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is the residual variance for the best model?

A

Variance not explained by the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is the total variance for the simplest model?

A

Variance not explained by the mean of y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

How do we calculate the goodness of fit in the best model regression?

A

Refer to the residual variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Calculate goodness of fit using residual variance

a. Best model
b. Simplest model

A

a. Best model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Calculate goodness of fit using total variance

a. Best model
b. Simplest model

A

b. Simplest model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

The difference between the observed values of y and the mean of y

i.e. the variance in y not explained by the simplest model (b = 0)

a. SST
b. SSR

A

a. SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

SST is…?

a. The difference between the observed values of y and those predicted by the regression line

b. The difference between the observed values of y and the mean of y

A

b. The difference between the observed values of y and the mean of y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

The difference between the observed values of y and those predicted by the regression line

i.e. the variance in y not explained by the regression model

a. SST
b. SSR

A

b. SSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What does the difference between SST and SSR reflect?

A

Reflects the improvement in prediction using
the regression model compared to the simplest model

i.e. the reduction in unexplained variance using the regression model compared
to the simplest model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is the formula to calculate SSM?

A

SST - SSR = SSM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

The larger the SSM, the _______ the improvement in prediction using the regression model over the simplest model

a. Smaller
b. Bigger

A

b. Bigger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

The larger the SSM, the bigger the …?

A

Improvement in prediction using the regression model over the simplest model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

The _____ the SSM, the bigger the improvement in prediction using the regression model over the simplest model

a. Larger
b. Smaller

A

a. Larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

How do we evaluate the improvement due to the model (SSM), relative to the variance the model does not explain (SSR)?

A

Use an F-test (ANOVA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What do we use an F-test (ANOVA) for when assessing the goodness of fit for a regression?

A

To evaluate the improvement due to the model
(SSM), relative to the variance the model does not explain (SSR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

The improvement due to the model is known as…?

a. SSM
b. SSR

A

a. SSM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

The variance the model does not explain is known as…?

a. SSM
b. SSR

A

b. SSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Rather than using the Sums of Squares (SS) values, the F-test uses …?

A

Mean Squares (MS) values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

True or False?

F-test uses Sums of Squares (SS) values

A

False

F-test uses Mean Squares (MS) values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

True or False?

F-test uses Mean Squares (MS) values, which do not take the degrees of freedom into account

A

False

F-test uses Mean Squares (MS) values, which take the degrees of freedom into account

59
Q

What is the formula for MSM?

A

MSM = SSM / dfM

60
Q

What is the formula for MSR?

A

MSR = SSR / dfR

61
Q

Provides a measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model

This is known as…?

A

F-ratio

62
Q

What is an F-ratio in regression?

A

Provides a measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model

63
Q

F-ratio provides a measure of how much the model has improved the prediction of y, relative to …?

A

The level of inaccuracy of the model

64
Q

What is the formula for f-ratio for regression?

A

F = MSM / MSR

65
Q

If the regression model is good at predicting y (relative to the simplest model, i.e. b = 0), the improvement in prediction due to the model (MSM) will be ______ , while the level of inaccuracy of the model (MSR) will be _____

a. Small. small
b. Small, Large
c. Large, Large
d. Large, small

A

d. Large, small

66
Q

If the regression model is good at predicting y (relative to the simplest model, i.e. b= 0), the improvement in prediction due to the model (MSM) will be _____

a. Large
b. Small
c. Medium
d. 0

A

a. Large

67
Q

If the regression model is good at predicting y (relative to the simplest model, i.e. b = 0), the
level of inaccuracy of the model (MSR) will be ______

a. Large
b. Small
c. Medium
d. 0

A

b. Small

68
Q

What does it mean when the F-value is further from 0?

A

The regression model is good at predicting y

69
Q

What is the null hypothesis for the regression model?

A

The regression model and the simplest model are equal (in terms of predicting y)

MSM = 0

70
Q

What is the formula for the regression equation (predicting y from x)?

A

y = bx + a

or

y = slope (x) + intercept

71
Q

What is the formula used for predicting y from x?

A

y = bx + a

or

y = slope (x) + intercept

72
Q

Calculate y from x when…?

a = 2.76
b = 1.06
x = 6 miles

A

y = bx + a
y = (1.06 * x) + 2.76
y = (1.06 * 6) + 2.76
y = 6.36 + 2.76
y = 9.12

73
Q

True or False?

If x and y are negatively correlated the value of b in y = bx + a will be positive

A

False

If x and y are negatively correlated the value of b in y = bx + a will be negative

74
Q

What are the 3 assumptions of regression?

A
  1. Linearity: x and y must be linearly related
  2. Absence of outliers
  3. Normality, linearity and homoscedasticity, independence of residuals*
75
Q

What is the linearity assumption for regression?

A

x and y must be linearly

The relationship between x and y can be described by a straight line

76
Q

What is the absence of outliers assumption for regression?

A

Regression, like correlation, is extremely sensitive to outliers

It may be appropriate to remove such data points

77
Q

What is the normality of residuals assumption for regression?

A

Residuals should be normally distributed around the predicted outcome

78
Q

What is the linearity of residuals assumption for regression?

A

Residuals should have a straight line relationship with the outcome

79
Q

What is the homoscedasticity of residuals assumption for regression?

A

Variance of residuals about the outcome should be the same for all predicted scores

80
Q

What is the non-parametric equivalent for regression?

A

There are none

Instead, we can attempt a fix

81
Q

What do we check to determine whether the assumptions for regression have been met?

A

Refer to a scatterplot

82
Q

How do we know the assumptions for regression have been met based on a Normal P-P Plot of Regression Standardized Residual results?

A

Data points will lie in a reasonably straight diagonal
line, from bottom left to top right

This would suggest no major deviations from normality

83
Q

Data points will lie in a reasonably straight diagonal
line, from bottom left to top right

What does this suggest?

A

No major deviations from normality

84
Q

How do we know the homoscedasticity assumption for regression have been met based on a scatterplot of Regression Standardized Residual results?

A

Residuals will be roughly rectangularly
distributed, with most scores concentrated inthe centre (0)

Don’t want to see systematic patterns to
residuals (curvilinear, or higher on one side)

85
Q

What are considered outliers on a scatterplot of Regression Standardized Residual results?

A

Standardised residuals > 3.3 or < -3.3

86
Q

What does R^2 measure?

A

The proportion of variance explained by the model

87
Q

The proportion of variance explained by the model is measured by…?

A

R^2

88
Q

What does R measure?

A

The strength of the relationship between x and y
(Equivalent to r if there is only 1 predictor variable

89
Q

The strength of the relationship between x and y
(Equivalent to r if there is only 1 predictor variable is measured by…?

A

R

90
Q

What does adjusted R^2 measure?

A

Adjusted R2 has been adjusted to account for the degrees of freedom (number of participants and number of parameters being estimated)

91
Q

If we wanted to use the regression model to generalise the results of our sample to the population, which one would we use?

a. R
b. R^2
c. Adjusted R^2

A

c. Adjusted R^2

92
Q

Why do we use adjusted R^2 when using the regression model to generalise the results of our sample to the population?

A

Because R^2 is too optimistic

Adjusted R2 has been adjusted to account for the degrees of freedom (number of participants and number of parameters being estimated)

93
Q

How do we report the F-value based on SPSS results?

A

F(df Regression, df Residual) = F-value, p = p-value

94
Q

Where do we find the intercept (a) and the slope (b) on the SPSS output?

A

Look at the ‘Coefficients’ table

The intercept (a) is the ‘(Constant)’ row under the ‘B’ column

The slope (b) is the row below the ‘(Constant)’ row under the ‘B’ column

95
Q

The slope converted to a standardised score is known as…?

A

Beta

96
Q

What is Beta?

A

The slope converted to a standardised score

97
Q

What is the t-value equivalent to?

A

√F when we only have 1 predictor variable

98
Q

An equivalent to √F when we only have 1 predictor variable is known as…?

A

t-value

99
Q

t-statistic tests the null hypothesis that …?

A

The value of b is 0

100
Q

What does the same job as the F-test when we have just one predictor variable?

A

t-statistic test

101
Q

The amount of variance in y explained by the model (SSM), relative to the total variance in y (SST)

This is known as…?

A

R^2

102
Q

What is R^2?

A

The amount of variance in y explained by the model (SSM), relative to the total variance in y (SST)

103
Q

What is the formula for R^2?

A

R^2 = SSM / SST

104
Q

How can we express R^2 as a %?

A

Multiply by 100

105
Q

If R^2 = .32, we would conclude that the regression model explained ____% of the variance in y

A

32

106
Q

What does r^2 measure in a correlation?

A

The proportion of shared variance between two variables

107
Q

In regression, we assume that x _____ the variance in y

A

Explains

108
Q

r^2 is equivalent to R^2 if …?

A

We have only 1 predictor

109
Q

SQRT R^2 = r if …?

A

We have only 1 predictor

110
Q

If we have only 1 predictor, how do we calculate r from R^2?

A

SQRT R^2

111
Q

If we have only 1 predictor, what is r^2 equivalent to?

A

R^2

112
Q

True or False?

In regression, we assume that y explains the variance in x

A

False

In regression, we assume that x explains the
variance in y

113
Q

Assesses how much y will change as a
result of a given change in x

a. Simple linear regression
b. Multiple regression

A

a. Simple linear regression

114
Q

Assess the influence of several predictor variables (e.g. x1, x2, x3 etc…) on the outcome variable (y)

a. Simple linear regression
b. Multiple regression

A

b. Multiple regression

115
Q

What does multiple regression allow us to do?

A

Assess the influence of several predictor variables (e.g. x1, x2, x3 etc…) on the outcome variable (y)

116
Q

We obtain a measure of how much variance in the outcome variable (y) the predictor variables (x1
and x2) combined explain

a. Simple linear regression
b. Multiple regression

A

b. Multiple regression

117
Q

How can we obtain a measure of how much variance in the outcome variable (y) the predictor variables (x1
and x2) combined explain?

A

By constructing a model which incorporates the slopes of each predictor variable

118
Q

We also obtain measures of how much variance in the outcome variable (y) our predictor variables(x1
and x2) explain when considered separately

a. Simple linear regression
b. Multiple regression

A

b. Multiple regression

119
Q

What are the 2 things we obtain through multiple regression?

A
  1. A measure of how much variance in the outcome variable (y) the predictor variables (x1 and x2) combined explain
  2. Measures of how much variance in the outcome variable (y) our predictor variables(x1 and x2) explain when considered separately
120
Q

What is the formula for the (multiple) regression equation

A

y = b1x1 + b2x2 + b3x3… + a

121
Q

Proposing a model to explain the relationship between all predictors (x1,x2) and y

This is known as…?

A

Multiple regression

122
Q

What are the 3 stages of multiple regression?

A
  1. Model to explain the relationship between x1
    and y

Model to explain the relationship between x2
and y etc

  1. Multiple regression: Proposing a model to explain
    the relationship between all predictors (x1,x2) and y
  2. Evaluating the model: Goodness-of-fit
122
Q

What are the 5 assumptions of multiple regression?

A
  1. Sufficient sample size
  2. Linearity
  3. Absence of outliers
  4. Multicollinearity
  5. Normality, linearity and homoscedasticity, independence of residuals
123
Q

What is the sufficient sample size assumption of a multiple regression?

A

To look at the combined effect of several predictors:
N > 50 + 8M
e.g. for 3 predictor variables you need at least 74 Ps

To look at the separate effects of several predictors:
N > 104 + M
e.g. for 3 predictor variables you need at least 107 Ps

Too few participants may result in over-optimistic results (results may not be generalisable)

124
Q

What is the linearity assumption of a multiple regression?

A

Predictor variables should be linearly related to the
outcome variable

125
Q

What is the absence of outliers assumption of a multiple regression?

A

Regression, like correlation, is extremely sensitive to outliers

It may be appropriate to remove such data points

126
Q

What is the multicollinearity assumption of a multiple regression?

A

Ideally, predictor variables will be correlated with the outcome variable but not with one another

Check the correlation matrix before performing the regression analysis

Predictor variables which are highly correlated with one another (r = .9 and above) are measuring much the same thing

It may be appropriate to combine the correlated predictor variables or to remove one

127
Q

What is the normality independence of residuals assumption of a multiple regression?

A

Residuals should be normally distributed around the predicted outcome

128
Q

What is the linearity independence of residuals assumption of a multiple regression?

A

Residuals should have a straight line relationship with the outcome

129
Q

What is the homoscedasticity independence of residuals assumption of a multiple regression?

A

Variance of residuals about the outcome should be the same for all predicted scores

130
Q

Residuals should have a straight line relationship with the outcome

a. homoscedasticity independence of residuals
b. normality independence of residuals
c. linearity independence of residuals

A

c. linearity independence of residuals

131
Q

Residuals should be normally distributed around the predicted outcome

a. homoscedasticity independence of residuals
b. normality independence of residuals
c. linearity independence of residuals

A

b. normality independence of residuals

132
Q

Variance of residuals about the outcome should be the same for all predicted scores

a. homoscedasticity independence of residuals
b. normality independence of residuals
c. linearity independence of residuals

A

a. homoscedasticity independence of residuals

133
Q

How do we check the assumptions for multiple regression have been met?

List 2 points

A
  1. Scatterplots
  2. Correlation matrix
134
Q

How do we know the assumptions for multiple regression have been met based on a Normal P-P Plot of Regression Standardized Residual results?

A

Data points will lie in a reasonably straight diagonal
line, from bottom left to top right

This would suggest no major deviations from normality

135
Q

How do we know the homoscedasticity assumption for multiple regression have been met based on a scatterplot of Regression Standardized Residual results?

A

Residuals will be roughly rectangularly
distributed, with most scores concentrated in the centre (0)

Don’t want to see systematic pattern to
residuals (curvilinear, or higher on one side)

136
Q

What are considered outliers on a scatterplot of Multiple Regression Standardized Residual results?

A

Standardised residuals > 3.3 or < -3.3

137
Q

B

age = 1.067
naughty list rating = -8.962

Beta

age = .492
naughty list rating = -.294

What assumptions can be made with this data?

A

For every 1 year increase in age, Christmas joy increases by 1.07 points

As age increases by 1 SD, Christmas joy increases by 0.49 SDs

For every 1 point higher on the naughty list rating, Christmas joy decreases by 8.96 points

As naughty list rating increases by 1SD, Christmas joy decreases by 0.29 SDs

138
Q

B

age = 1.067
naughty list rating = -8.962

Beta

age = .492
naughty list rating = -.294

a= 62.612

How much Christmas joy would you predict for a 41 year-old who scores 5.8 on Santa’s naughty list for 2020?

A

y = b1x1 + b2x2 + a
y = (1.067 * age) + (-8.962 * naughty) + 62.612
y = (1.067 * 41) + (-8.962 * 5.8) + 62.612
ŷ = (43.747) + (-51.9796) + 62.612
y = 54.3794

139
Q

Assesses the significance of each predictor separately

This is known as…?

A

t-values

140
Q

What do the t-values in multiple regression output on SPSS tell us?

A

How much each individual predictor, separately, improves the prediction of y

141
Q

How much each individual predictor, separately, improves the prediction of y

a. P-value
b. F-value
c. t-value
d. M-value

A

c. t-value

142
Q

What is the null hypothesis of multiple regression model?

A

The predictor and the simplest model are equal