Multiple linear regression Flashcards

1
Q

Define

Unstandardised coefficient

A

represents the expected change that one unit change of the IV will have on the DV. Hard to interpret magnitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define

Standardised coefficient (β)

A

represents the expected standard deviation change in the DV, for a standard deviation change in the predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define

Adjusted R^2

A

gives an estimate of R2 in the population taking into account the fact that the regression model might overfit your particular dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define

Sampling error

A

the discrepancy between sample and population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define

R^2 shrinkage

A

failure to replicate the R^2 on subsequent regression models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define

Cohen’s f2

A

a measure of effect size based on R^2; tell you the unique effect of one predictor variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define

Homoscedasticity

A

all residuals in the model have equal variances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define

Heteroscedasticity

A

all residuals in the model have unequal variances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define

Multiple regression

A

a type of regression that is used to predict values of an outcome from several predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define

Multicollinearity

A

the presence of high intercorrelations between predictor variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define

F-ratio

A

The overall significance of the regression equation; a significant value indicates that the equation predicts a significant proportion of the variability in the Y scores (i.e., more than would be expected by chance alone)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define

Dummy coding

A

the process of coding a categorical variable into dichotomous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define

Semi-partial (part) correlation

A

Measures the relationship between two variables, controlling for the effect that a third variable has on one of the others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Definition

represents the expected change that one unit change of the IV will have on the DV. Hard to interpret magnitude

A

Unstandardised coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Definition

represents the expected standard deviation change in the DV, for a standard deviation change in the predictor

A

Standardised coefficient (β)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Definition

gives an estimate of R2 in the population taking into account the fact that the regression model might overfit your particular dataset

A

Adjusted R^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Definition

the discrepancy between sample and population

A

Sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Definition

failure to replicate the R^2 on subsequent regression models

A

R^2 shrinkage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Definition

a measure of effect size based on R^2; tell you the unique effect of one predictor variable

A

Cohen’s f2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Definition

all residuals in the model have equal variances

A

Homoscedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Definition

all residuals in the model have unequal variances

A

Heteroscedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Definition

a type of regression that is used to predict values of an outcome from several predictors

A

Multiple regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Definition

the presence of high intercorrelations between predictor variables

A

Multicollinearity

24
Q

Definition

The overall significance of the regression equation; a significant value indicates that the equation predicts a significant proportion of the variability in the Y scores (i.e., more than would be expected by chance alone)

A

F-ratio

25
Q

Definition

the process of coding a categorical variable into dichotomous variables

A

Dummy coding

26
Q

Definition

Measures the relationship between two variables, controlling for the effect that a third variable has on one of the others

A

Semi-partial (part) correlation

27
Q

What does the unstandardised coefficient of this relationship mean?

A

-1.584 indicates that a 1 unit change in self-esteem is predicted to be associated with a 1.584 decrease in loneliness

28
Q

Why is it hard to interpret the magnitude of the effect unstandardised coefficients?

A

Coefficients depend on the scale of the predictors and the DVs

If self-esteem ranges from 0 to 2, a 1 unit change is a huge difference; if the range is 0 to 100, a 1 unit change is very small

29
Q

What does the standardised beta coefficient tell you about this relationship?

A

In the current regression, β = -.257, meaning that a 1 SD unit increase in self-esteem is expected to predict a .257 decrease in loneliness

30
Q

Imagine that the variance of the DV is 5, and after calculating the regression, the variance of the residuals is 4. What is the model explained variance?

A

5 - 4 = 1

31
Q

What is the formula for R2?

Imagine that the variance of the DV is 5, and after calculating the regression, the variance of the residuals is 4. What is R2?

A

R2 = MEV/Total variance

Model explained variance (MEV) = 5 - 4 = 1

Total variance = 5

R2 = 1 / 5 = 0.2 = 20%

32
Q

What is considered a small, medium and large R2?

A

There are no fixed rules for what a small, medium, or large R2 is, but many people consider .04 (4%) as small, .09 (9%) as medium, and .25 (25%) as large

33
Q

What is an adjusted R2? What does it consider?

A

Gives an estimate of R2 in the population. It takes into account sample size, the number of predictors, etc…

34
Q

Sampling error increases as sample size _________and as the number of predictors ___________

A

Sampling error increases as sample size decreases and as the number of predictors increase

35
Q

Why is R2 likely to overestimate the size of the effect in the population?

A

Sampling error increases as sample size decreases and as the number of predictors increase

36
Q

Shrinkage is best evaluated using a ______________

A

Shrinkage is best evaluated using a cross validation study

37
Q

What is it called when a regression model does not work with a new data set?

A

Shrinkage

38
Q

What happens if there is a large discrepancy between R2 and adjusted R2?

A

it indicates the regression model does not generalise well to the population

39
Q

When is Cohen’s f2 useful?

A

Most useful when using a variant for multiple regression

40
Q

Age accounted for 10% of the variance in loneliness on its own

Age + self-esteem combined accounted for 15% of variance in loneliness

What is the f2 for self-esteem?

A

Difference between these two is self-esteem unique variance (15% - 10% = 5%)

f2 for the unique effect of self-esteem would be:

𝑓2 = (.15 − .10) / (1 − .15) = .05 / .85 = .059

41
Q

What assumptions must be met to run a linear regression?

A
  • Outcome variable must be continuous; predictors can be continuous or dichotomous
  • Predictors must not have zero variance
  • Independence
  • Linearity
  • Independent errors
  • Normally-distributed errors
  • Homoscedasticity
42
Q

What type of plot can we use to look for non-linearity in regression? What indicates that the assumption has been met?

A

Partial regression plot (i.e. residual and predicted)

The absence of clear pattern, means assumption has been met

43
Q

How do we deal with non-linear data in regression?

A
44
Q

What does the independent errors assumption mean?

A

For any 2 observations, the errors or residual terms should be independent (i.e., uncorrelated) with one another

Observations should be independent and there should not be any systematic relationship amongst the residuals

45
Q

What does the normally-distributed errors assumption mean?

A

Residuals for the regression model should be random and normally distributed, with mean = 0

Note: this does not mean the predictors have to be normally distributed – predictors do not need to be normally distributed (although, it does improve the chances of this assumption being met)

46
Q

What assumption does this scatterplot violate?

A

Homoscedasticity

47
Q

What should you report from a linear regression?

A
  1. How you assessed the assumptions and whether they appear violated
  2. Any attempts to address violated assumptions
  3. Whether these attempts resolves assumption issues or if they were still violated
  4. Descriptive statistics (e.g. mean and SD/median and IQR/frequency and percent)
  5. Standardised coefficient, effect size, R2, confidence intervals and p-value
48
Q

How is the mulitple regression equation different to the linear regression one?

A

Add bkXk, where k = number of predictor variable included

In multiple regression bk captures the unique relationship, conditional (adjusted) for all other predictors in the model

49
Q

Which variable is the best predictor?

A

The β value with the largest magnitude (ignoring the +/- signs) is the best predictor

Thus, self-esteem is the best predictor of loneliness – indeed, it is the only significant predictor

50
Q

What does this R2 of a multiple regression model tell you? What does it mean for this example?

A

R Square (R2 ) tells you the proportion of variability in the outcome variable that is accounted for by the predictor variables

In our example, 7% of the variability in loneliness is accounted for by self-esteem and number of exercise days (when considered together)

51
Q

What statistics do we use to measure multicollinearity?

A

Tolerance: predictor variances with tolerances < .10 are multicollinear with 1+ other predictors, which is concerning (i.e., you have a multicollinearity problem)

VIF: VIF = 1/tolerance; predictor variables with VIF > 10 are multicollinear with 1+ other predictors, which is concerning (i.e., you have a multicollinearity problem)

52
Q

What extra assumption does multiple regression have that linear regression doesn’t?

A

Multicollinearity

53
Q

What should you do if you have an issue with multicollinearity?

A

Consider deleting one of the offending variables 2

Combine the variables with high intercorrelations into a single measure

54
Q

What does the semi-partial correlations tell us in this example?

A

The variability in loneliness uniquely accounted for by self-esteem = (-.261)2 * 100 = 6.81%

The variability in loneliness uniquely accounted for by exercise = (.058)2 * 100 = 0.34%

55
Q

The overall significance of the regression equation can be evaluated by computing an ________

A

The overall significance of the regression equation can be evaluated by computing an F-ratio

56
Q

What does a significant F-ratio indicate?

A

A significant F-ration indicates that the equation predicts a significant proportion of the variability in the Y scores (i.e., more than would be expected by chance alone)