Association Flashcards
What is the Pearson Correlation coefficient?
- Measure of linear correlation between two numeric variables.
- It is a measure of how well the data fit a straight line.
- The value lies between 1 and -1
Describe the different pearson correlation coefficient values that can be obtained and what they mean
- r > 0 we have a positive correlation; implying that if one variable increases then so does the other.
- r
When should the pearson correlation coefficient not be used?
- There is a non-linear relationship between variables
- There are outliers
- There are distinct sub-groups (if we mix two samples together such as healthy controls and disease cases)
- One or both of the variables is not normally distributed.
- One or both of the variables is non-numeric.
When should the Pearson correlation coefficient be calculated/used? What is the alternative?
- Only calculated between two normally distributed variables
- Spearman rank correlation coefficient
When can Spearman’s rank correlation coefficient be calculated/used?
- When the data is not normally distributed
- When one or both of the variables are ordinal
- When the sample size is small
What are the 4 possibilities if two variables correlate?
- The result occurred by chance
- A influences (or ‘causes’) B
- B influences (or ‘causes’) A.
- A and B are influenced by some other variable(s), C
How can two variables A and B be influenced by some other variable(s) C?
- C may ‘cause’ both A and B
- A may lead to an increase in C which ‘causes’ B
Define linear regression
Fitting a straight line to points on a scatterplot
Where can the independent and dependant variables on a scatterplot be found?
- Independent= X-axis
- Dependant= Y-axis
What are residuals?
The difference between the observed data and the predicted value from the model
What assumptions are made in regression analysis?
- The relationship must be approximately linear
- The residuals have to be normally distributed
What is a contingency table used for?
Examining the association between two categorical variables
What hypothesis test can be used on a contingency table? Describe it
- Chi-squared test
- Comparing the contingency table observed with one expected if the null hypothesis were true.
What are conditions for the Chi-squared test?
The number of expected values in each of the four cells should be greater than 1. And in three of the four cells the expected value should be greater than 5.
What is continuity correction?
- Yates’s correction
- For small sample sizes the chi-squared test is too likely to reject the null hypothesis
- The Chi-squared conditions still have to be met.