Week 6: Correlation Flashcards
What does correlation measure in comparison to linear regression?
Linear regression estimates the best fit of a straight line. While correlation measures the strength of the linear association
What is the range of the correlation coefficient (r)?
-1 to 1
What does r = 0 signify in correlation?
There is no linear correlation between the variables
How are strength levels of correlation categorised?
- Very strong ± 0.90 to ± 1
- Strong ± 0.70 to ± 0.89
- Moderate ± 0.40 to ± 0.69
- Weak ± 0.10 to ± 0.39
- No or very weak ± 0 to ± 0.09
What does the coefficient of determination (R2) indicate?
The amount of variance shared between two variables. By multiplying R2, we can get a percentage of the variability explained by the predictor
How is R2 calculated?
By squaring the coefficient (r)
What is H0 in correlation hypothesis testing?
There is no correlation between the variables (r = 0)
Key assumptions for Pearson correlation
- Random sample
- Continuous data
- Paired sample data
- Independence of observations
- Approximate normal distribution
- Linear association
- Absence of outliers
What alternatives exist if Pearson correlation assumptions are violated?
Use non-parametric methods like Spearman Rank correlation or Kendall’s tau
How does Spearman Rank correlation differ from Pearson correlation?
It assesses monotonic relationships (whether linear or not), ranking values instead of using the original measurements - used if data are measured on an ordinal scale
Why is visual inspection of data important before calculating correlation?
To confirm linearity and check for the presence of outliers - if there are outliers, can we justify removing them?
Why is correlation not equivalent to causation?
Correlation only indicates a statistical relationship, not a cause-effect link
What does a strong positive Pearson correlation coefficient (e.g., r = 0.76) suggest?
A strong positive linear relationship between two variables
How should missing data in paired samples be handled when calculating r?
Use complete case analysis (only cases with data on x and y considered), but be aware of potential biases
What the main visual check to perform before calculating Pearson correlation?
Ensure the data appears linear in a scatter plot?