RESS I: Data Analysis #3 Flashcards
What is correlation?
Correlation measures the strength off the linear relationship between two numerical variables?
When is the Pearson Correlation Coefficient calculated? Where is the Spearman’s Rank Correlation Test done?
Pearsons: To find the association between two normally-distributed variables.
Spearman’s: When the variables are not normally distributed as Spearman correlation is less sensitive than Pearsons correlation to strong outliers.
What is are the requirements of the Pearson Correlation Coefficient?
- The relationship must be linear
- The variables must be normally distributed
- The direction of the relationship will be positive or negative
- The strength fo the relationship will be from low to hight (0 to 1)
How do you interpret ‘r’, the Pearsons Correlation Coefficient?
Ifr> 0 we have a positive correlation; implying that as one variable increases then so does the other.
Ifr< 0 we have a negative correlation; implying that as one variable increases then the other decreases.
Ifr= 0 we have no correlation; implying there is no association between the two variables.
If r=+1 there is a perfect positive correlation.
If r=-1 there is a perfect negative correlation.
When should you not use correlation tests?
- When the relationship is non-linear
- When there is the presence of outliers
- There are distinct sub-groups e.g. health controls with diseased cases
How would you describe the association between one numerical and one categorical variable?
Mean (or median) difference measures the strength of the relationship between one numerical variable and one categorical variable.
This can then be demonstrated in a comparative box plot or histogram.
What statistical association tests can be done between one numerical and one categorical variable?
- Independent samples t-test (also two-sample t-test)
- Mann-Whitney-U test
What is the t-test?
This test measure the association between one normally-distributed variable and one binary variable
– Difference between two means
– Direction of relationship positive or negative compared to control
– Strength/magnitude of relationship low to high (0 to infinity)…in units of the continuous variable.
What are the assumptions of the t-test/
- Two independent groups
- Numerical variable is Normally distributed in both groups
- Similar standard deviations in both groups
What is the t-statistic?
The mean difference/standard error of mean difference
How do you describe the association between two categorical variables?
Proportion difference measures the strength of the relationship between two categorical variables.
What tests are done to compare two categorical variables?
- Chi-squared test (standard)
- Chi-squared test (continuity/Yates’ correction)
- Fisher’s exact test
What is the Chi-squared test used for?
To test to see if there is a statistical difference between the observed and expected result.
What is the continuity correction (Yates’ correction)?
For small sample sizes the chi-squared test is too likely to reject the null hypothesis. A continuity correction can be made to allow for this.
Summarise statistical tests for different variable relationships.
Two continuous = correlations
One continuous, one binary = t-tests
Two binary = chi-squared tests