Statistics V - Correlation, Causation, Regression Flashcards

1
Q

Regression is a model of what?

A

A model of the influence of x on y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s a multivariate regression?

A

Regression with
1 independent and
several dependent variables
(-> several univeriate regressions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s a multiple regression?

A

Regression with
several independent and
1 dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s a multivariate multiple regression?

A

Regression with
several independent and
several dependent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s a regression with
1 independent and
several dependent variables
(-> several univeriate regressions)?

A

A multivariate regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s a regression with
several independent and
1 dependent variable?

A

A multiple regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What’s a regression with
several independent and
several dependent variables?

A

A multivariate multiple regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the multivariate regression used for?

A

It is used for investigating the effects of one independent variable on several dependent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a multiple regression: What does ßi stand for?

A

the partial coefficient of regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is the partial coefficient of regression equal to the bivariate coefficient of regression?

A

No!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can we learn from the following example?
Epidemiological studies showed that women who were taking combined hormone replacement therapy (HRT) also had a lower- than-average incidence of coronary heart disease (CHD), leading doctors to propose that HRT was protective against CHD. But controlled trials showed that HRT caused a small and significant increase in risk of CHD. Re-analysis of the data showed that women undertaking HRT were more likely to be from higher socio-economic groups (ABC1), with better than average diet and exercise regimes.

A

Correlation DOES NOT imply causation!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a confounding variable?

A

A confounding variable is an extraneous variable in a statistical model that correlates (positively of negatively) with both the dependent and the independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Other terms for confounding variable include…

A

… confounding factor, hidden variable, lurking variable, confound or confounder.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Simpson paradox / Simpson reversal (aka. Yule-Simpson effect)?

A

In probability and statistics, Simpson’s paradox, or the Yule–Simpson effect, is a paradox in which a trend that appears in different groups of data disappears when these groups are combined, and the reverse trend appears for the aggregate data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Two examples of Simpson’s paradox?

A

Kidney Stone Treatment

Berkeley gender bias case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does selection bias mean?

A

Selection bias is a statistical bias in which there is an error in choosing the individuals or groups to take part in a scientific study.

17
Q

What is the Berkson’s paradox?

A

Specifically, it arises when there is an ascertainment bias inherent in a study design.
The result is that two independent events become conditionally dependent (negatively dependent) given that at least one of them occurs.

18
Q

If you give a class of students a test on two successive days, the worst performers on the first day will tend to improve their scores on the second day, and the best performers on the first day will tend to do worse on the second day. The phenomenon occurs because …

A

… each sample is affected by random variance.

→ negative correlation between the two successive scores

19
Q

What is meant by regression to the mean?

A

If a pair of independent measurements are made from the same distribution, samples far from the mean on the first measurement will tend to be closer to the mean on the second one. Moreover, the farther from the mean on the first measurement, the stronger the effect is.

20
Q

What is meant by Reichenbach’s “no correlation without causation”?

A

If any two variables are dependent, then one is the cause of the other or (!) there is a third variable causing both.

21
Q

Hume (18. century) on causality:

A

We can only perceive post hoc (nacheinander), never propter hoc (wegeneinander).

22
Q

Describe the pattern of the post hoc fallacy!

A

The form of the post hoc fallacy can be expressed as follows:

    A occurred, then B occurred.
    Therefore, A caused B.

When B is undesirable, this pattern is often extended in reverse: Avoiding A will prevent B.

23
Q

Kant (18. century) on causality:

A

Agrees with Hume: Causality derives from reasoning not from perception.

24
Q

Explain spurious correlation!

A

Spurious correlation does not imply causality.

25
Q

Rules for causality according to Davis (1985):

A

A causes B if:

  • A starts before B
  • A comes in a known order before B
  • A is more stable, more difficult to influence or more influential.
26
Q

To control variables Sir Ronald Fischer suggests …

A

… randomization included in the experimental design.

27
Q

Statistical means to control for variables:

A
  • residuals of regression
  • repetition among control categories
  • weighting of data
28
Q

Controlling of prior and intervening variables can lead to the …

A

… direct effect.