4 introduction to mediation analysis Flashcards

1
Q

What is mediation analysis?

A

Mediation Analysis relies on the principles of regression to investigate if the relationship between variable X and variable Y is in any way mediated by a third variable (M).

what does X do in the context of Y and how does an underlying mechanism M interact?

if a change in variable X leads to a change in a Mediator variable, which subsequently changes our outcome, Y variable.

mediator is itself affected by X or Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between a mediator variable and confounding variables?

A

when confound -> no direct effect of X on Y

confound both influences X and Y

no X-M-Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give to examples of possible mediator paths.

A

e.g. A therapeutic method (X) might affect symptoms experienced after the termination of therapy (Y) because the method influences how people interpret
negative events that occur in life (M), and those interpretations then influence the extent to which symptoms are manifested.

e.g. traumatic experiences (X) might negatively influence happiness one gets from interpersonal interactions (Y) because traumatic experiences result in the manifestation of certain behaviors that others find uncomfortable to witness (M), and this in turn produces less pleasant interactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the pathways of a simple mediaton model?

A

two pathways:

indirect effect through M

direct effect X on Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Does M have a causal influence on Y?

A

Yes, this causation causes the variation in Y

however, the causal influence does not eliminate the association between X and Y

M = mediator variable, itermediary variable, surrogate variable, intermediate endpoint

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Do X and Y need to be associated for a possible mediation?

A

“lack of correlation does not disprove causation”

“correlation is neither a necessary nor a sufficient condition of causality”

-> no longer a precondition that X and Y have simple association

EXAMPLE: Consider a scenario where a new educational program (X) is designed to improve students’ test scores (Y) by increasing their motivation (M). If the program doesn’t directly improve test scores but significantly boosts motivation, which in turn leads to better scores, a direct correlation between X (program) and Y (scores) might be weak or absent. However, the program still has a causal effect on the scores through the mediator (motivation).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What if X and M interact with each other? Does it change the statistical analysis?

A

if effect of M on Y is not straightforward
-> changes depending on X

-> this needs to be accounted for
-> include an interaction term XM (like in moderation analysis)

-> coefficient b needs to be reconsidered

-> direct effect of X on Y is affected
-> there is no longer a simple direct effect, because this changes depending on M
(key difference of mediation to moderation analysis!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Should there be testing for a possible interaction XM?

A

No

selective testing
evidence-based decision

no reason for prioritisation
-> equal possibility for correlations!

overfitting a model is unnecessary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is sufficient to conclude an indirect effect/mediation of X-M-Y?

A

A rejection of the null hypothesis that the indirect effect is zero (or an interval estimate that doesn’t include zero) is sufficient to support a claim of
mediation of the effect of X on Y through M.

tests of significance for the individual paths a and b are not required to determine whether M mediates the effect of X on Y, contrary to the causal steps logic which requires that both a and b are statistically significant.

Indeed, one does not even need to establish that the total effect of X as quantified by c is different from zero, since the size of c does not determine or constrain the size of ab.

⇒ Rather, all that matters is whether ab is different from zero by some kind of inferential standard such as a null hypothesis test or confidence interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are three principles of mediation analysis?

A
  • empirical claims should be based on a quantificaiton of the effect most directly relevant to that claim
    • if ab quantifies the movement of Y by X through M, measure that
      not a and b
    • it cannot be said, that if a and b are different from zero that ab is as well
  • a claim should be based on as few inferential tests as required in order to support it
    • fallible by nature
    • why require three, when you can do one for ab
  • convey information about the uncertainty attached to estimates of quantities
    • dichotomous decision of M
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is evidence of an existing mediation effect?

A

⇒ if the effect of X on Y when M is held constant (coefficient c’ in equation (3), called the direct effect of X) is closer to zero than is X’s effect without controlling for M (coefficient c in equation (1), the total effect of X), then M can be deemed a mediator of X’s effect on Y.

⇒ if M is held constant, the magnitude of the direct effect of X on Y diminishes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is partial mediation, what is complete mediation?

A

partial mediation = patterns of findings where mediation is established in the presence of significant total effect of X and direct effect of X (c´) is different from zero

effect of X-Y is not fully explained by X-M-Y

complete/full mediation = all of the effect of X on Y is carried through the mediation process, meaning ab=c and c´=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which two linear models are required for a mediation analysis?

A

see notes.

M = im + aX + em

Y = iy + c´X + bM + ey

a = X on M
b = M on Y
c´= X on Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is OLS regression analysis?

A

fundamental statistical method used to estimate the relationships between a dependent variable and one or more independent variables

What is the linear equation that best predicts the dependent variable based on the independent variables?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some assumptions the OLS regression analysis makes?

A
  • Linearity: The relationship between the independent and dependent variables is linear.
  • Independence: The residuals (errors) are independent of each other.
  • Homoscedasticity: The variance of the error terms is constant across all levels of the independent variables.
  • Normality: The residuals are normally distributed (particularly important for hypothesis testing regarding coefficients).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the direct effect of X on Y?

A

c´ = adjusted mean difference

two cases that differ by one unit on X but are equal on M are estimated to differ c´ units on Y

-> adjusted for M (held constant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is M held constant in the estimation of the direct effect of X on Y?

A

Keeping M constant (or controlling for M) ensures that the direct effect of X on Y is isolated. This way, we can see how X influences Y directly, not through its effect on M. It’s like holding all other variables steady to focus solely on the relationship between X and Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Does X have to be dichotomous in a simple mediation analysis?

A

In a simple mediation model, X can be any of the following:

  1. Continuous: For instance, X could represent hours of study, dosage of a medication, or levels of stress, where X takes on a range of values.
  2. Dichotomous: This is where X has two categories, like the examples mentioned above.
  3. Categorical with More than Two Levels: For example, X could represent types of diets (vegetarian, vegan, omnivore) or levels of education (high school, bachelor’s, master’s, PhD).

When X is continuous, the mediation effect explains how a change in X (e.g., an increase in one unit of X) is associated with a change in Y, mediated by M.

When X is categorical, the mediation effect explains how being in one category of X compared to another (or others) is associated with changes in Y, mediated by M.

19
Q

What is the indirect effect of X on M on Y?

A

the combined effect of X on M and M on Y

ab

a = how much do two cases that differ by one unit on X, differ on M?

b = how much do two cases, that differ by one unit on M but are equal on X, differ on Y?

20
Q

What is the total effect of X on Y?

A

c´+ ab

the combined effect of X on M and M on Y added to the effect of X on Y

Y = iy + c´X + bM + ey

Y = (it + bim) + (ab + c´)X + (ey + bem)

21
Q

What would mediation analysis look like if multiple covariate are included?

A

Y = iy + cX + c1U1 + c2U2 + …

U = covariates

multiple independent variables, that all have a direct effect cx on Y

all Us are mediated by M
so there is multiple indirect effects

M solely influences Y
(only one b, but multiple a)

22
Q

What is the concept of epiphenomenal associations in the context of multiple mediator models?

A

epiphenomenal = explanation of associated between two variables
refers to a phenomenon that occurs alongside or in parallel to another process but does not directly influence or contribute to the primary process

Applied to mediation analysis, if your proposed mediator M1 is not actually mediating the effect of X on Y yet is correlated with M2, which is a mediator of the effect of X on Y, a mediation analysis with M1 but not M2 in the model may nevertheless reveal a significant indirect effect of X on Y through M1.

23
Q

What are recommendations to account for these epiphenomenal relations?

A

investigators interested in mediation through more than one mediator do so by estimating all the indirect effects in one multiple mediator model

→ maximizing correspondence between theory and model

→ one indirect effect may be epiphenomenal (explanation for association between two variables, merely correlated, no causation)

→ it is possible to compare the size of indirect effects through different mediators

24
Q

How can statistical inference be made from this simple mediation model?

A

before: c = c´ + ab
→ sample specific instantiations

true values: tc, tc´, tatb
→ associations between variables in data available

generalisability

25
Q

How would the total effect be infered statistically?

A

the sum of the direct effect of X on Y and indirect effect of X on Y through M

Although the total effect is the sum of two pathways of influence, it can be estimated by regressing Y on just X, without M in the model.

null hypothesis test
(H0: tc = 0)
”no association between X and Y”

26
Q

How can the direct effect be statistically inferred?

A

The direct effect quantifies the estimated difference in Y between two cases that are equal on M but differ by one unit on X .

standard method used for inference for any regression coefficient in a regression model
null hypothesis testing about tc´
is X related to Y independent of M?

27
Q

How can the indirect effect be inferred statistically?

A

The indirect effect quantifies how much two cases that differ by a unit on X are estimated to differ on Y as a result of X ’s influence on M , which in turn influences Y .

X → M → Y causal chain of events

null hypothesis test about tatb or by constructing an interval estimate

28
Q

What is the normal theory approach?

A

product of coefficients, Sobel test, delta method

= estimation of SE of ab and assuming the sampling distribution of ab is normal, a p-value for ab can be derived given a specific null hypothesized value of a, b, or an interval estimate

29
Q

How would the second order standard error be calculated?

A

seab = the root of (a2seb2 + b2sea2 + sea2seb2)

and the indirect effect would be Z = ab/seab

30
Q

How can CIs be generated if you prefer interval estimates over null hypothesis testing?

A

ab - Zci% * seab < tatb < ab + Zci% * seab

Zci% for 95% is 1.96
-> fixed value, independent of sample data
-> corresponds to 97.5th percentile in normal distribution

31
Q

What are the flaws of the normal theory approach?

A

method assumes that sampling distribution of ab is normal and simulation research has shown it is one of the lowest in power and generates confidence intervals that are less accurate then other methods

32
Q

What other method can be used for statistical inference with correction of the former flaws?

A

bootstrap confidence interval

resampling methods

versatile method can be applied to many inferential problems
when behaviour of statistic over repeated sampling is not known, too complicated or context-dependent

33
Q

What are the main principles of bootstrap methods?

A
  1. Resampling: From your original dataset of size N, create a “bootstrap sample” by randomly selecting N observations with replacement. This means some original observations may appear more than once in a bootstrap sample, while others may not appear at all.
  2. Calculate the Statistic: Compute the statistic of interest (e.g., mean, median, correlation coefficient) for the bootstrap sample.
  3. Repeat: Repeat steps 1 and 2 many times (usually thousands of times) to generate a distribution of the statistic. Each time, you’ll likely end up with a slightly different bootstrap sample and hence a slightly different statistic.
  4. Confidence Interval Estimation: Once you have a bootstrap distribution of your statistic, you can estimate its confidence interval. For a 95% confidence interval, you would typically take the 2.5th percentile and the 97.5th percentile of the bootstrap statistics as the lower and upper bounds, respectively.
34
Q

What are advantages of bootstrap methods?

A
  • Flexibility: Bootstrap methods do not require strong assumptions about the data distribution (e.g., normality).
  • Applicability: They can be used with complex statistics where traditional methods for confidence interval estimation are not available or difficult to apply.
  • Intuitive: The process of resampling with replacement is conceptually straightforward and easy to implement with modern computing power.
35
Q

What is a good sample for bootstrap methods?

A

unbiased sample
-> quality of sample as representation of population
assumption of good representation
otherwise method will inherit biases

large sample
-> otherwise outliers
-> normality is more likely

number of resampling
observation selection should be fixed

36
Q

Why are alternative approaches to confidence interval inference needed?

A

The sampling distribution of the indirect effect (a*b) is often not symmetric, which can lead to biased estimates if one assumes normality. This asymmetry arises because the product of two normally distributed variables (like a and b) does not follow a normal distribution itself.

37
Q

What alternative inference approaches exist?

A
  1. Monte Carlo Confidence Intervals (Simulation-Based):
    • This approach involves generating a large number of simulated samples (via resampling techniques) from the observed data. For each simulated sample, the indirect effect is computed. By aggregating the indirect effects across all simulations, you can construct an empirical distribution and then determine the confidence interval based on the percentiles of this distribution. This method naturally accommodates the asymmetry in the distribution of the indirect effect.
  2. Distribution of the Product Approach:
    This is a more mathematically complex method that involves approximating the sampling distribution of the product of a and b. The method acknowledges the non-normality and potential skewness of this distribution. By employing the distribution of the product approach, researchers can derive confidence intervals that better reflect the actual, skewed distribution of the indirect effect.
  3. Transformation of ab to a Standardized Metric:
    • Sometimes, the product of a and b is transformed into a standardized metric to facilitate the estimation of confidence intervals. One common transformation is to standardize the indirect effect by its standard error before using a standard normal distribution to derive confidence intervals. However, this approach also needs to consider the underlying distribution’s skewness and kurtosis.
  4. Upper and Lower Bounds:
    • In the context of asymmetric confidence intervals, the upper and lower bounds are not equidistant from the point estimate of the indirect effect. This asymmetry in the bounds reflects the underlying distribution’s shape, providing a more accurate and informative interval estimate for the indirect effect.
38
Q

What effect does the inference method have on the results?

A

tend to produce same susbtantive inference about the indirect effect, sometimes they will not tho

depends on relative concern about type 1 (claiming an indirect effect exists when it does not) and type 2 (failing to detect an indirect effect thats real)

Sobel test - higher 2
bias correction can inflate 1
percentile bootstrap ci has become recommended

39
Q

What are the two steps involved in a mediation analysis?

A
  1. Conduct a simple linear regression, including the X and Y variables to see if a relationship exists between the two.
  2. Conduct a Mediation Analysis, including the X, Y and M, to investigate if the relationship between X and Y is changed in the presence of variable M.
40
Q

How is step 1: simple linear regression conducted?

A

ANOVA + coefficient p-value
-> identical

is X on Y significant?

yes, consider unstandardized coefficient
-> direction of the relationship

41
Q

In step 2: mediation analysis, how should the direct effect be interpreted?

A
  1. If the p-value is no longer significant then this suggests the M variable is fully mediating the relationship between X and Y.
  2. If the p-value is still significant then this does not mean that M does not mediate at all but that the effect of M on the X to Y relationship is not the only explanation for why changes in X lead to changes in Y.
42
Q

In step 2: mediation analysis, how should the indirect effect be interpreted?

A
  1. To establish the significance, you need to look at the bootstrapped lower limit confidence interval (LLCI) and the upper limit confidence interval (ULCI).
    1. If the range from LLCI to ULCI includes zero, then the indirect effect is not significant.
    2. If the range from LLCI to ULCI does not include zero, then the indirect effect is significant.
    3. The coefficient of the indirect effect can be calculated by multiplying the coefficient of X to M with the coefficient of M to Y. (ab)
43
Q

How can the total effect be calculated?

A

To calculate the total effect, you need to add the coefficient for the direct and indirect effect together.

44
Q

What are limitations of mediation analysis?

A
  • require large sample sizes
  • assumption of no mediated moderation (constant across levels of the independent variable)
  • multicollinearity
    -> if mediator and X are highly correlated it might be difficult to disentangle separate effects on Y
  • mediator-outcome confounding, and other confounds
    -> failing to control for confounds to M-Y
  • assumption of linearity
  • temporal ambiguity
    X-M-Y
    -> what if that order is not always the case?
  • model specification
    statistical accuracy/power
  • measurement error
  • causality
    does not prove anything