Lecture 9: Multiple Regression Flashcards

1
Q

Multiple regression

A

Statistical technique that allows us to examine the relationship between one outcome and multiple predictors.

A regression model can be expanded to include as many predictors as needed: ANOVA is already an example of multiple regression. The model with two dummies we saw earlier is also an example of multiple regression.

Answers the question: what is the unique effect of one predictor, controlling for the effect of all other predictors?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Multiple regression formula

A

Y1 = a + b1 * X1i + b2 * X2

a is the intercept (expected value when all predictors are equal to 0), Y represents the predicted value of the dependent variable Y, and B1 and B2 are slopes. B tells us how many points Y increases if X goes up by 1, while keeping all other X-values equal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Centering predictors

A

By centering, we shift the zero-point of the predictor to a meaningful value, such as the mean value on that predictor. This helps in interpretation, because the intercept now gives us the mean value on the outcome for someone who has an average score on all predictors.

After centering, we will see that the multiple regression line is perfectly in the middle of the data cloud again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When do we use multiple regression?

A
  • When we want to make better predictions using all available predictors
  • To compare relative importance of different predictors => useful when we have several potential causes of one outcome, and we want to know which of those causes is more influential
  • When a theory implies multiple causes
  • To improve causal inference (i.e., the conclusions that we draw about the effect of one predictor) by controlling for confounders
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Standardised regression coefficient

A

The regression coefficient you would get IF you carried out the analysis after standardising the X and Y variables. A one SD increase in X (independent variable) is associated with a .. SD increase in Y (the dependent variable).

  • Useful for when we want to know how important different predictors are
  • Also useful when we want to compare the effect of the same variable across two studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unstandardised regression coefficient

A

A one unit increase in X (the independent variable) is associated with a b unit increase Y (the dependent variable).

Can be used when the units are meaningful/important (e.g. years, euros, centimeters, number of questions correct). Also useful if there are (clinical) cut-off scores (e.g., a predicted value of 60 or higher corresponds with being depressed).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Multicollinearity

A

Problem with multiple regression: multiple regression gives us the unique effect of each predictor, controlling for all other predictors. But what if multiple predictors overlap substantially?

Example: what would happen if I predicted total body length from the length of people’s left leg and right leg?
- It’s possible to predict body length from leg length
- Legs are approximately equally long
- They both predict body length equally strongly
- Their effects overlap nearly 100%
- In other words, the left leg does not have a unique effect controlling for right leg, or vice versa

Multicollinearity occurs when two or more predictors explain the same variance in the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Causes of multicollinearity

A

1) You have a small dataset and values of predictors tend to go together
- Extreme examples: 2 participants, 2 categorical predictors
- Person 1 is Dutch and has a tattoo, person 2 is international and has no tattoo
- Variables ‘nationality’ and ‘tattoo’ are perfectly colinear

2) Several of your variables measure the same thing
- Father’s SES and mother’s SES (similar when they have a shared bank account), or emotion regulation and neuroticism (activation in 2 brain regions that are jointly activated)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Consequences of multicollinearity

A

1) Biased estimates of unique effect of collinear predictors (one predictor, e.g., right leg, may have said to have a much larger effect than another predictor, e.g., left leg, despite this not being true)

2) Inflated standard errors for unique effect
Indicating that the algorithm has great uncertainty about whether this is the correct parameter estimate

3) No effect on the model’s predictive ability! R-square is unaffected

Only the parameters’ estimates are biased, not the overall model predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Causality

A

Can only be established in experiments using random assignment, or assumed based on theory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Mediators

A

Variables that explain the relationship between the independent and dependent variables. It identifies the “how” and “why” behind the observed relationship.

Imagine you are studying the relationship between exercise (independent variable) and improved mood (dependent variable). Self-esteem could act as a mediator, indicating that exercise leads to increased self-esteem, which, in turn, influences mood.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Colliders

A

Variables that are caused by both the independent and dependent variables. Controlling for colliders can lead to spurious statistical relationships between unrelated variables.

Suppose you’re investigating the relationship between ice cream consumption (independent variable) and the number of drowning incidents (dependent variable). Sunscreen use could act as a collider because it is affected by both hot weather (related to ice cream consumption) and swimming (related to drowning incidents). Conditioning on sunscreen use might create a misleading association between ice cream consumption and drowning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Confounders

A

Additional variables that are related to both the primary independent variable of interest but can affect the outcome of a study. These variables can introduce a systematic error (bias) into the research, leading to incorrect conclusions about the relationship between the primary independent variable and the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Moderators

A

Variables that affects the strength or direction of the relationship between an independent variable and dependent variable.

For example, let’s consider a study examining the relationship between the amount of study time (independent variable) and academic performance (dependent variable), with the presence of a moderator variable such as student motivation. Student motivation could moderate the relationship: for highly motivated students, the relationship between study time and academic performance might be stronger compared to less motivated students.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly