LEC 10 Correlation Flashcards
Correlation definition (3)
- the quantification of the degree to which 2 random variables (continuous or ordinal) are related
- variables must be numerical
- provided that the relationship is linear
(check scatter plot to check for potential linear relationship)
Scatter plot
- plot y against x
- useful for visually examining whether a relationship exists between 2 numerical variables
Correlation coefficient
- quantitative measure of the strength and direction of a linear relationship between 2 variables
Types of correlation coefficient (2)
& the type of data analysed
- Pearson product-moment correlation coefficient
- continuous normally distributed variables
- r - Spearman rank correlation coefficient
- continuous non-normally distributed variables
- ordinal
- less sensitive to outlying values as it uses rank > definite values
- rs
Are r and rs dimensionless?
Yes, no units
Range of possible r and rs values
-1 to 1
What does the sign of r and rs indicates?
The direction of the linear relationship between the 2 variables
What does the magnitude of r and rs indicates?
The strength of the linear relationship between the 2 variables
<0.5 : weak
0.5-0.7 : strong
>= 0.7 : very strong
If r/rs = 0?
- means no linear correlation
- DOES NOT mean no correlation cos it can be other non-linear correlation (eg curve)
- check scatter plot for relationship
If r/rs = 1
Perfect positive correlation
If r/rs = -1
Perfect negative correlation
OR
Perfect inverse correlation
If r/rs > 0 (positive)
- means positive correlation
- both variables tend to increase together
If r/rs < 0 (negative)
- means negative/inverse correlation
- one variable increases as the other decreases
Potential misuse of the correlation coefficient (3)
- If correlation coefficient = 0, does not mean no relationship, only mean no LINEAR relationship
- If strong correlation coefficient, does not necessarily imply “linearity” as some parts of the graph might be non-linear
- Does not imply causation (cause-and-effect relationship)
Statistical test for correlation if BOTH data is :
- continuous
- normally distributed
Pearson product-moment correlation
(parametric test)
To test the null hypothesis that there is no correlation between the 2 numerical variables
Pearson product-moment correlation assumptions (3)
- The x and y values are independent
- The pairs of observations are randomly selected
- For Pearson product-moment correlation, the underlying populations of BOTH variables are normally distributed
Pearson product-moment correlation hypothesis
Ho :
- There is no correlation between the 2 variables
H1 :
- There is a correlation between the 2 variables (two-tailed)
- There is a positive correlation between the 2 variables (one-tailed)
- There is a negative correlation between the 2 variables (one-tailed)
Spearman Rank correlation assumptions (2)
- The x and y values are independent
2. The pairs observed are randomly selected
Spearman Rank correlation process
Involves ranking the values of x and y
Correlation coefficient and p-value
Correlation =/ p-value
What if your data is :
- 1 continuous normally distributed
- 1 ordinal / continuous non-normally distributed
OR
- both are ordinal / continuous non-normally distributed
?
Use Spearman Rank correlation
Correlation assumption (1)
Linear relationship between the 2 variables
Can you perform correlation analysis if scatter plot shows non-linear relationship?
No.
Correlation analysis assumes linear relationship between the 2 variables.
If statistical report shows p = 0.000
p < 0.0005
Advantage of Spearman Rank Correlation
Less sensitive to outlying values as compared to Pearson Product Moment Correlation
- use ranks rather than the actual values