ARM - week 2 Flashcards
regression
describes the mathematical relationships between outcome and one or more other variables. Regression adjusts for several confounders and/or intermediate variables at the same time.
ordinary least squares regression.
Each dot represents 1 person. The straight line is the regression line. The regression line comes from an equation. The software tries to draw the line in such a way that the squared differences between the fitted line and the observations are as small as possible
coefficient meaning
The coefficient (in this case 0.80) describes the slope. The coefficient does have an interpretation → for every centimeter in height the people are expected to be 800 grams heavier.
Coefficients have a meaning → it is not just ‘positive’ or ‘negative’. Coefficients are expressed on the same scale as the outcome.
The regression equation can be used to predict the outcome → the prediction is the average for people with the same characteristics.
OLS regression in software
- Tell the software what the outcome variable is and what the explanatory variables are
- Software finds coefficients with the best fit → least squares
- Software does not distinguish between exposure and confounders
- Software cannot tell whether the results are correct
moderation
It is possible that height works differently for men than for women. This can be analyzed by adding an interaction term to the regression analysis
In this case the interaction term represents the additional height effect for women. The value of this interaction term is 0 for men and for women the value is their height. The coefficient of the interaction term means that extra height for women adds less weight (67 gram) for women than it does for men.
Moderation is not clearly part of the DAG concept → it is not about bias. Moderation is about distinguishing between subgroups instead of taking the average. The coefficient of the interaction term represents the additional effect in a subgroup compared to the other subgroup. The interpretation of the interaction term is easier when you fill out the regression equations of the subgroups.
Wheelan’s warnings about regression analyzes
- non-linearity
- multicollinearity
- extrapolating beyond the data
- reverse causality
- omitted variable bias
non-linearity
the relationship may be quadratic or logarithmic instead of linear
- Multicollinearity
explanatory variables that cannot be distinguished (if all the men in the data are old and all the women are young, you can no longer distinguish the effect of age from the effect of sex)
- Extrapolating beyond the data
results are only valid in similar populations
- Reverse causality
the exposure does not have an effect on the outcome, but the outcome has an effect on the exposure (should be visible in a DAG)
- Omitted variable bias
unresolved confounding
normal causal inference question
“what is the effect of X on Y?”
mediation question/ mediation analyzes
“why does X have this effect, is it because of M?” and “if we do something about M, would that reduce the effect of X on Y?”.
In a mediation analysis, causal paths may have to be blocked.
mediator, intermediate variable
a variable makes another causal path
adjusting for an intermediate leads to the estimate of a partial/ direct causal effect. not adjusting for an intermediate leads to the estimate of a full effect
how to adjust for a path wit a collider
adjust for collider AND 1 other variable in the path. the will block the backdoor path again