ARM - week 2 Flashcards

1
Q

regression

A

describes the mathematical relationships between outcome and one or more other variables. Regression adjusts for several confounders and/or intermediate variables at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ordinary least squares regression.

A

Each dot represents 1 person. The straight line is the regression line. The regression line comes from an equation. The software tries to draw the line in such a way that the squared differences between the fitted line and the observations are as small as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

coefficient meaning

A

The coefficient (in this case 0.80) describes the slope. The coefficient does have an interpretation → for every centimeter in height the people are expected to be 800 grams heavier.

Coefficients have a meaning → it is not just ‘positive’ or ‘negative’. Coefficients are expressed on the same scale as the outcome.

The regression equation can be used to predict the outcome → the prediction is the average for people with the same characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

OLS regression in software

A
  • Tell the software what the outcome variable is and what the explanatory variables are
  • Software finds coefficients with the best fit → least squares
  • Software does not distinguish between exposure and confounders
  • Software cannot tell whether the results are correct
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

moderation

A

It is possible that height works differently for men than for women. This can be analyzed by adding an interaction term to the regression analysis

In this case the interaction term represents the additional height effect for women. The value of this interaction term is 0 for men and for women the value is their height. The coefficient of the interaction term means that extra height for women adds less weight (67 gram) for women than it does for men.

Moderation is not clearly part of the DAG concept → it is not about bias. Moderation is about distinguishing between subgroups instead of taking the average. The coefficient of the interaction term represents the additional effect in a subgroup compared to the other subgroup. The interpretation of the interaction term is easier when you fill out the regression equations of the subgroups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Wheelan’s warnings about regression analyzes

A
  1. non-linearity
  2. multicollinearity
  3. extrapolating beyond the data
  4. reverse causality
  5. omitted variable bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

non-linearity

A

the relationship may be quadratic or logarithmic instead of linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  • Multicollinearity
A

explanatory variables that cannot be distinguished (if all the men in the data are old and all the women are young, you can no longer distinguish the effect of age from the effect of sex)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  • Extrapolating beyond the data
A

results are only valid in similar populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  • Reverse causality
A

the exposure does not have an effect on the outcome, but the outcome has an effect on the exposure (should be visible in a DAG)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  • Omitted variable bias
A

unresolved confounding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

normal causal inference question

A

“what is the effect of X on Y?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

mediation question/ mediation analyzes

A

“why does X have this effect, is it because of M?” and “if we do something about M, would that reduce the effect of X on Y?”.

In a mediation analysis, causal paths may have to be blocked.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

mediator, intermediate variable

A

a variable makes another causal path

adjusting for an intermediate leads to the estimate of a partial/ direct causal effect. not adjusting for an intermediate leads to the estimate of a full effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how to adjust for a path wit a collider

A

adjust for collider AND 1 other variable in the path. the will block the backdoor path again

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

p-value

A

The probability of finding this association (or stronger) in a sample if the real association is 0 → if the null-hypothesis is true, what is the likelihood of finding this association. The difference is statistically significance if it is below a certain threshold (level of significance, α) → 0.05.

The p-value being the probability of finding the results (or more extreme) if the null hypothesis is true is not the same as the p-value being the probability that the null hypothesis is true, given the results.

17
Q

The probability that the significant result indicate a true effect depends on:

A
  1. the prior probability (how many of these 100 drugs are actually effective)
  2. it depends on the power (how likely is it that you find a significant result if the drug is actually effective)
  3. it depends on the significance level.
18
Q

what is not p value/ significance

A
  • The p-value is not the probability that there is a real effect
  • The p-value is not the probability that the exact estimate is correct
  • The significance level is not a law of nature
  • Significance is not about causality
  • Significance is not proof, but it is evidence
  • Significant does not mean ‘relevant’, ‘substantial’ or ‘interesting’
19
Q
  • Significance level α
A

The significance level is set by the researcher → it does not have to be 0.05. A lower significance level results in less significant results → there are less false-positive results, but there are more false-negative results.

20
Q
  • Power
A

Power is the chance of finding a significant result in a sample if the effect is real. Power is affected by the sample size (the larger the sample size, the higher the power), the real association in the population (if the association in the population is stronger, the power is higher) and the variability in the population (if the outcomes are more or less the same, the power is higher and if the outcomes are very different, the power is lower). The association and variability in the population are unknown.

The usefulness of p-values is limited, even when they are used correctly. The p-value is not a measure of precision, it’s only about the difference with 0. Strictly, the null hypothesis is always wrong. The difference with null is usually not interesting. Whether ‘it works’ (a drug) depends on the strength of the effect.

21
Q
  • Prior probability
A

The prior probability is unknown. However, the plausibility of the hypothesis can be assessed with subject knowledge. Both the power and the prior probability can’t be quantified. This means that you can’t say exactly how likely it is that a significant result is based on a true association → it’s probably less than 95%.

22
Q

published results

A

Mathematically, you can show that significant associations are likely to be overestimates → if you find a lower estimate than the actual value, it’s less likely that the result is significant and if you find a higher estimate than the actual value, it’s more likely that the result is significant. So, published results are also more likely to be overestimates → only statistically significant studies get published. Focusing on statistical significance takes attention away from the size of an effect.

23
Q

Problems with the p-value as stated by the American Statistical Association

A
  • P-values can indicate how incompatible the data are with a specified statistical model
  • P-values don’t measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone
  • Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold
  • Proper inference requires full reporting and transparency
  • A p-value or statistical significance does not measure the size of an effect or the importance of a result
  • By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis
24
Q

testing

A

Testing gives a dichotomous result → yes (there is enough evidence) or no (there is not enough evidence). If the result is significant there is evidence for the association and if the result is not significant there is no evidence for the association. However, absence of evidence is not evidence of absence → the fact that the result is not significant doesn’t mean that there is not a true effect. This yes/no answer, however, is not really interesting.

25
Q

estimation

A

Estimation is about the size of the estimated effect. Estimation is much more informative than testing. Estimation is implicitly part of the original question. Estimation has a relationship with theory → estimation requires intuitively interpretable outcome measures.

26
Q

Confidence intervals

A
  • Lower bound of the interval → outcome -1.96 · standard error
  • Upper bound of the interval → outcome +1.96 · standard error

Uncertainty can be reduced by taking a larger sample → with a larger sample the standard error will be smaller and the confidence interval will be more narrow.

Confidence intervals mean that there is a 95% probability that the interval will contain a true value → if the sample was repeated many times and intervals were calculated, 95% of the intervals would contain the correct value.

27
Q

Confidence intervals versus p-values

A
  • Both can be used for significance testing
  • Both are based on the Central Limit Theorem and normal distribution
  • A low p-value is evidence against the null hypothesis, but the strength of this evidence is unclear
  • P-value does not have to be combined with significant level
  • Confidence intervals incorporate the effect size
  • Confidence intervals give a sense of uncertainty
28
Q

difference OLS and logistic

A

OLS –> continuous
Logistic –> dichotomous

29
Q

estimated weight

A

the average weight for individuals with the same characteristics