L11 Correlation Flashcards

1
Q

Define ‘correlation’.

A

Quantification of the degree to which two random variables (continuous/ordinal) are related, provided their relationship is linear.

  • i.e. Correlation analysis assumes a linear or straight-line relationship between the two variables
  • Measures the linear relationship between two variables x & y on n individuals

1st step before performing correlation analysis is to construct a scatter plot of y against x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of a scatter plot in correlation analysis?

A

Required to plot to visually examine whether a relationship exists between two numerical variables, before performing correlation analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the significance of the correlation coefficient.

A

Quantitative measure of strength and direction of a linear relationship between two variables

  • r and rs are dimensionless (i.e. unitless)
  • Range of possible values is -1 to 1

Sign of r or rs indicates the direction of the linear relationship between two variables.

  • r or rs > 0 means positive correlation
  • r or rs < 0 means negative correlation
  • r or rs = 1 means perfect positive correlation
  • r or rs = -1 means perfect negative correlation
  • r or rs = 0 means no correlation (no linear relationship)

Magnitude of r and rs indicates strength between two variables

  • < +- 0.5: Weak linear relationship
  • +- 0.5 to +- 0.7: Strong linear relationship
  • +- 0.7 to +- 1.0: Very strong linear relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which type of statistical test is used to examine the correlation between two numerical continuous variables, both of which are normally distributed?

A

Parametric test: Pearson product-moment correlation (r)

- Used when BOTH variables are continuously normally distributed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which type of statistical test is used to examine the correlation between two numerical continuous variables, where one of which is normally distributed?

A

Non-parametric test: Spearman rank correlation (rs)

- Used when one or both variables are continuously NON-normally distributed or ordinal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which type of statistical test is used to examine the correlation between two numerical variables, where one of which is an ordinal variable?

A

Non-parametric test: Spearman rank correlation (rs)

- Used when one or both variables are continuously NON-normally distributed or ordinal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some potential misuses of the correlation coefficient?

A

May not be useful if underlying relationship is NON-linear.

  • r or rs = 0 does NOT necessarily mean there is NO relationship between variables, JUST NO linear relationship
  • Relationships may be present, but non-linear.
  • High r or rs value does NOT necessarily imply “linearity”

ALWAYS construct a scatter plot to look for patterns first before performing correlation analysis!
- Do NOT perform correlation analysis when scatter plots show NON-linear patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some potential misinterpretations of the correlation coefficient?

A

Correlation does NOT necessarily imply causation (cause-and-effect relationship)!!
- A correlation between x and y does NOT necessarily imply that x caused y, since confounders may be present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

State the purpose behind the hypothesis testing of correlation analysis.

A

To test the H0 that there is no correlation between two numerical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

State the assumptions when using Pearson product-moment correlation analysis.

A

1) The x and y values are independent.
2) The pairs of observations (xi, yi) are randomly selected.
3) The underlying population of both variables are normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

State the assumptions when using Spearman rank correlation analysis.

A

1) The x and y values are independent.

2) The pairs of observations (xi, yi) are randomly selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

E.g. of how to write conclusion of correlation analyses.

A

At the significance level of 0.05, there is a statistically significant negative correlation (r = -0.791, p < 0.0005) between the percentage of children who have been immunised against the infectious diseases diphtheria, pertussis, and tetanus (DPT) in a given country and its under-5 mortality rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the disadvantage of using Spearman rank correlation against the Pearson product-moment correlation?

A

Less sensitive to outlying values as compared with Pearson product-moment correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How similar or different is performing the Spearman rank correlation as compared to the Pearson product-moment correlation?

A

Performing the Spearman rank correlation is equivalent to performing the Pearson product-moment correlation on the ranked values of x and y (i.e. using assigned ranks rather than the actual observations).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly