Statistics V - Correlation, Causation, Regression Flashcards
Regression is a model of what?
A model of the influence of x on y.
What’s a multivariate regression?
Regression with
1 independent and
several dependent variables
(-> several univeriate regressions)
What’s a multiple regression?
Regression with
several independent and
1 dependent variable.
What’s a multivariate multiple regression?
Regression with
several independent and
several dependent variables
What’s a regression with
1 independent and
several dependent variables
(-> several univeriate regressions)?
A multivariate regression.
What’s a regression with
several independent and
1 dependent variable?
A multiple regression.
What’s a regression with
several independent and
several dependent variables?
A multivariate multiple regression.
What is the multivariate regression used for?
It is used for investigating the effects of one independent variable on several dependent variables.
In a multiple regression: What does ßi stand for?
the partial coefficient of regression
Is the partial coefficient of regression equal to the bivariate coefficient of regression?
No!
What can we learn from the following example?
Epidemiological studies showed that women who were taking combined hormone replacement therapy (HRT) also had a lower- than-average incidence of coronary heart disease (CHD), leading doctors to propose that HRT was protective against CHD. But controlled trials showed that HRT caused a small and significant increase in risk of CHD. Re-analysis of the data showed that women undertaking HRT were more likely to be from higher socio-economic groups (ABC1), with better than average diet and exercise regimes.
Correlation DOES NOT imply causation!
What is a confounding variable?
A confounding variable is an extraneous variable in a statistical model that correlates (positively of negatively) with both the dependent and the independent variable.
Other terms for confounding variable include…
… confounding factor, hidden variable, lurking variable, confound or confounder.
What is the Simpson paradox / Simpson reversal (aka. Yule-Simpson effect)?
In probability and statistics, Simpson’s paradox, or the Yule–Simpson effect, is a paradox in which a trend that appears in different groups of data disappears when these groups are combined, and the reverse trend appears for the aggregate data.
Two examples of Simpson’s paradox?
Kidney Stone Treatment
Berkeley gender bias case
What does selection bias mean?
Selection bias is a statistical bias in which there is an error in choosing the individuals or groups to take part in a scientific study.
What is the Berkson’s paradox?
Specifically, it arises when there is an ascertainment bias inherent in a study design.
The result is that two independent events become conditionally dependent (negatively dependent) given that at least one of them occurs.
If you give a class of students a test on two successive days, the worst performers on the first day will tend to improve their scores on the second day, and the best performers on the first day will tend to do worse on the second day. The phenomenon occurs because …
… each sample is affected by random variance.
→ negative correlation between the two successive scores
What is meant by regression to the mean?
If a pair of independent measurements are made from the same distribution, samples far from the mean on the first measurement will tend to be closer to the mean on the second one. Moreover, the farther from the mean on the first measurement, the stronger the effect is.
What is meant by Reichenbach’s “no correlation without causation”?
If any two variables are dependent, then one is the cause of the other or (!) there is a third variable causing both.
Hume (18. century) on causality:
We can only perceive post hoc (nacheinander), never propter hoc (wegeneinander).
Describe the pattern of the post hoc fallacy!
The form of the post hoc fallacy can be expressed as follows:
A occurred, then B occurred. Therefore, A caused B.
When B is undesirable, this pattern is often extended in reverse: Avoiding A will prevent B.
Kant (18. century) on causality:
Agrees with Hume: Causality derives from reasoning not from perception.
Explain spurious correlation!
Spurious correlation does not imply causality.
Rules for causality according to Davis (1985):
A causes B if:
- A starts before B
- A comes in a known order before B
- A is more stable, more difficult to influence or more influential.
To control variables Sir Ronald Fischer suggests …
… randomization included in the experimental design.
Statistical means to control for variables:
- residuals of regression
- repetition among control categories
- weighting of data
Controlling of prior and intervening variables can lead to the …
… direct effect.