L11 Correlation Flashcards
Define ‘correlation’.
Quantification of the degree to which two random variables (continuous/ordinal) are related, provided their relationship is linear.
- i.e. Correlation analysis assumes a linear or straight-line relationship between the two variables
- Measures the linear relationship between two variables x & y on n individuals
1st step before performing correlation analysis is to construct a scatter plot of y against x.
What is the purpose of a scatter plot in correlation analysis?
Required to plot to visually examine whether a relationship exists between two numerical variables, before performing correlation analysis.
Explain the significance of the correlation coefficient.
Quantitative measure of strength and direction of a linear relationship between two variables
- r and rs are dimensionless (i.e. unitless)
- Range of possible values is -1 to 1
Sign of r or rs indicates the direction of the linear relationship between two variables.
- r or rs > 0 means positive correlation
- r or rs < 0 means negative correlation
- r or rs = 1 means perfect positive correlation
- r or rs = -1 means perfect negative correlation
- r or rs = 0 means no correlation (no linear relationship)
Magnitude of r and rs indicates strength between two variables
- < +- 0.5: Weak linear relationship
- +- 0.5 to +- 0.7: Strong linear relationship
- +- 0.7 to +- 1.0: Very strong linear relationship
Which type of statistical test is used to examine the correlation between two numerical continuous variables, both of which are normally distributed?
Parametric test: Pearson product-moment correlation (r)
- Used when BOTH variables are continuously normally distributed data
Which type of statistical test is used to examine the correlation between two numerical continuous variables, where one of which is normally distributed?
Non-parametric test: Spearman rank correlation (rs)
- Used when one or both variables are continuously NON-normally distributed or ordinal data
Which type of statistical test is used to examine the correlation between two numerical variables, where one of which is an ordinal variable?
Non-parametric test: Spearman rank correlation (rs)
- Used when one or both variables are continuously NON-normally distributed or ordinal data
What are some potential misuses of the correlation coefficient?
May not be useful if underlying relationship is NON-linear.
- r or rs = 0 does NOT necessarily mean there is NO relationship between variables, JUST NO linear relationship
- Relationships may be present, but non-linear.
- High r or rs value does NOT necessarily imply “linearity”
ALWAYS construct a scatter plot to look for patterns first before performing correlation analysis!
- Do NOT perform correlation analysis when scatter plots show NON-linear patterns.
What are some potential misinterpretations of the correlation coefficient?
Correlation does NOT necessarily imply causation (cause-and-effect relationship)!!
- A correlation between x and y does NOT necessarily imply that x caused y, since confounders may be present.
State the purpose behind the hypothesis testing of correlation analysis.
To test the H0 that there is no correlation between two numerical variables.
State the assumptions when using Pearson product-moment correlation analysis.
1) The x and y values are independent.
2) The pairs of observations (xi, yi) are randomly selected.
3) The underlying population of both variables are normally distributed.
State the assumptions when using Spearman rank correlation analysis.
1) The x and y values are independent.
2) The pairs of observations (xi, yi) are randomly selected.
E.g. of how to write conclusion of correlation analyses.
At the significance level of 0.05, there is a statistically significant negative correlation (r = -0.791, p < 0.0005) between the percentage of children who have been immunised against the infectious diseases diphtheria, pertussis, and tetanus (DPT) in a given country and its under-5 mortality rate.
What is the disadvantage of using Spearman rank correlation against the Pearson product-moment correlation?
Less sensitive to outlying values as compared with Pearson product-moment correlation.
How similar or different is performing the Spearman rank correlation as compared to the Pearson product-moment correlation?
Performing the Spearman rank correlation is equivalent to performing the Pearson product-moment correlation on the ranked values of x and y (i.e. using assigned ranks rather than the actual observations).